Highlighting
Highlighting returns snippets of matching fields with the query terms wrapped in tags, so a UI can show why a document matched.
Why it matters
Search results without highlighted context force users to guess relevance. Highlighting extracts the best-matching fragments and emphasizes the hit terms — standard in any search UI — but it re-analyzes or re-reads field content per hit, so the choice of highlighter directly affects latency and storage.
How it works
Three highlighter implementations trade speed for accuracy and storage.
| Highlighter | Needs | Speed | Accuracy on phrases |
|---|---|---|---|
unified (default) | nothing extra | good | good (BM25-ranked fragments) |
plain | re-analyzes _source | slow on big fields | exact, small docs only |
fvh (fast vector) | term_vector: with_positions_offsets | fast on big fields | excellent |
unifieduses the LuceneHighlighterwith a BM25-like fragment scorer; works from postings or term vectors when available.- Tags — default
<em>…</em>; override withpre_tags/post_tags. - Fragmentation —
fragment_size(chars, default 100) andnumber_of_fragments(default 5) bound the snippets;number_of_fragments:0returns the whole field highlighted. require_field_match:falsehighlights fields other than the one queried (useful with multi_match).
Example
{ "query": { "match": { "body": "climate change" } },
"highlight": {
"fields": { "body": { "fragment_size": 120, "number_of_fragments": 3 } },
"pre_tags": ["<mark>"], "post_tags": ["</mark>"] } }
// → "...the science of <mark>climate</mark> <mark>change</mark> shows..."
The unified highlighter ranks the 3 best fragments by relevance, not document order.
Pitfalls
fvhneeds term vectors — enablingterm_vector: with_positions_offsetsroughly doubles that field’s index size; only worth it for large highlighted fields.plainon big fields — re-analyzes the whole_sourceper hit; can dominate query latency on multi-KB documents.- Mapping mismatch — highlighting a field whose analyzer changed since indexing produces wrong offsets/missing marks.
number_of_fragments:0cost — returning the entire field highlighted defeats fragment limits and inflates response size.