Index API
The Index API writes a single JSON document into an index, creating it or fully replacing an existing one with the same _id.
Why it matters
“Indexing” is the write path: it parses the doc against the mapping, runs analysis to build inverted-index postings, and appends to a segment. Understanding its three knobs — id assignment, create-vs-overwrite, and routing — is the difference between idempotent ingestion and silently clobbering data or scattering a tenant across every shard.
How it works
Two HTTP verbs map to two intents:
| Call | id source | If id exists | Use when |
|---|---|---|---|
POST /idx/_doc | ES auto-generates | n/a (always new) | append-only logs/events |
PUT /idx/_doc/{id} | you supply | overwrites (full re-index) | upsert by natural key |
PUT /idx/_create/{id} | you supply | 409 conflict | insert-once semantics |
- No partial writes — a
PUTwith an_idreplaces the whole_source; missing fields are dropped, not kept. Merging is the Update API’s job. - Routing —
_idhashes toshard = hash(routing) % number_of_primaries; pass?routing=acmeto colocate a tenant on one shard. - Durability — the write hits the in-memory buffer + translog; it is searchable only after the next refresh (NRT, default 1s), and crash-safe once
fsync’d (default per request,index.translog.durability).
Example
POST /events/_doc → auto id, 201 Created
{ "user": "u-7", "action": "login" }
PUT /products/_doc/SKU-22?routing=acme
{ "name": "bolt", "price": 4.5 } → _version:1, then 2, 3… on re-PUT
Re-PUTting SKU-22 returns result:"updated" and bumps _version; the old doc is tombstoned in its segment, reclaimed at merge.
Pitfalls
- Auto-id forces
createinternally — fine, but you lose idempotency: a retriedPOSTmakes a duplicate. UsePUT {id}for retry-safe pipelines. - First write auto-creates the index with dynamic mapping unless
action.auto_create_indexis restricted — a typo’d index name births a junk index. - Single-doc indexing at volume is slow — one network round-trip + refresh pressure per doc; batch with the Bulk API instead.