Indices

An index is a named collection of documents with one mapping and one set of settings, physically realized as a group of shards spread across the cluster.

Why it matters

The index is your primary unit of schema, scaling, and lifecycle: shard count, replicas, analyzers, and retention all attach here. Time-series workloads (logs, metrics) roll a new index per day/size and drop old ones whole — a cheap DELETE instead of millions of row deletes — which is why index design drives storage cost and query fan-out more than any single query does.

How it works

An index is mostly an alias-able label over shards plus metadata.

  • Settings split into twostatic (set at creation, e.g. number_of_shards) and dynamic (changeable live, e.g. number_of_replicas, refresh_interval).
  • Aliases give a stable name over a changing set of indices — point logs at logs-2026.05.30 and atomically swap on rollover; readers never change their query.
  • Templates apply mappings/settings to any index whose name matches a pattern, so new daily indices are born configured.
ConceptScopeExample
IndexSchema + shard topologyproducts, logs-2026.05.30
AliasLogical pointer over 1..N indiceslogs → today’s index
Data streamManaged append-only index setlogs backed by hidden indices

Example

PUT /products
{ "settings": { "number_of_shards": 3, "number_of_replicas": 1 },
  "mappings": { "properties": { "name": { "type": "text" } } } }

This creates 3 primaries × 1 replica = 6 shards. number_of_shards is fixed for the life of the index; to change it you reindex into a new one.

Pitfalls

  • Over-sharding — hundreds of tiny indices/shards waste heap (cluster state + per-shard overhead); aim for shards of ~10–50 GB.
  • Editing static settingsnumber_of_shards cannot change in place; plan it up front or use data-streams + rollover.
  • Hard-coding the dated index name in clients — always read/write through an alias so rollover is invisible.

See also