Segments

A segment is an immutable mini-index inside a shard: a self-contained inverted index (plus doc values and stored fields) that Lucene writes once and never modifies.

Why it matters

Segments are why Elasticsearch is fast to write and near real-time to read. Because they’re immutable, writes are append-only (no in-place updates, no write locks on existing data), but the cost is that a shard accumulates many segments that every query must search, and deletes/updates only mark old docs as tombstones. Managing segment count is core to read latency and disk usage.

How it works

New docs land in an in-memory buffer; a refresh turns the buffer into a new searchable segment.

  • Refresh ≠ commit — refresh (default every 1s) makes docs searchable; a flush fsyncs segments and clears the translog for durability.
  • Translog — a write-ahead log makes a doc durable before it’s in a committed segment, so a crash between refreshes loses nothing.
  • Updates and deletes are tombstones — the old doc stays on disk marked deleted until a merge removes it; _version/_seq_no track the live copy.
  • Merging — a background process combines small segments into fewer larger ones and physically drops deleted docs.
ActionFrequency (default)Effect
Refresh1sNew searchable segment in memory
Flush~translog size/agefsync segments, truncate translog
MergeContinuous, tieredFewer, bigger segments; purges deletes

Example

index 30k docs over 30s with refresh_interval=1s
→ ~30 small segments created (1 per refresh)
query → must consult all ~30 segments, union results
background merge → collapses to ~3–4 larger segments → faster queries

A bulk reload runs much faster with refresh_interval: -1 (no per-second segments), then a single refresh at the end.

Pitfalls

  • Too many segments — heavy small-batch indexing creates segment sprawl, slowing search; let merges catch up or raise refresh_interval.
  • Manual force_merge on hot indices — merging a write-active index to 1 segment wastes I/O and can hurt; only force-merge read-only (e.g. cold) indices.
  • Disk for merges — a merge can temporarily need extra free space equal to the segments being merged; a full disk stalls merging and indexing.

See also