Fuzzy & wildcard search
Approximate matching against indexed terms: fuzzy tolerates typos via edit distance, while wildcard/regexp/prefix match glob and pattern shapes over the term dictionary.
Why it matters
Users misspell (“databse”), and product/SKU search needs partial patterns (“AB-*-2024”). These queries fill the gap exact term matching leaves — but they expand a single clause into many candidate terms, so naïve use is the classic source of slow, CPU-heavy searches.
How it works
Each query rewrites into a set of matching terms before scoring; cost scales with how many terms it touches in the dictionary.
| Query | Matches | Main cost knobs |
|---|---|---|
fuzzy | terms within edit distance | fuzziness, prefix_length, max_expansions |
prefix | terms starting with X | length of prefix |
wildcard | */? glob | leading * scans all terms |
regexp | Lucene regex | anchored prefix vs full scan |
fuzziness: AUTO— 0 edits for terms ≤2 chars, 1 for 3–5, 2 for longer; uses Damerau-Levenshtein (transpositions count as one).prefix_length— fixing the first N chars (e.g. 2) shrinks expansion dramatically and is the single best fuzzy tuning lever.max_expansions— default 50; caps candidate terms, trading recall for speed.wildcardfield type — for high-cardinality patterns, the dedicatedwildcardmapping stores an n-gram index that beatskeyword+ leading wildcard.
Example
{ "match": { "title": { "query": "databse serch", "fuzziness": "AUTO",
"prefix_length": 1, "max_expansions": 50 } } }
{ "wildcard": { "sku.keyword": { "value": "AB-*-2024" } } } // anchored, OK
{ "wildcard": { "sku.keyword": { "value": "*-2024" } } } // leading *, slow
The fuzzy match finds “database search”; the anchored wildcard is cheap, the leading-* one walks every SKU term per shard.
Pitfalls
- Leading wildcard —
*term(and unanchoredregexp) is O(distinct terms) per shard; prefer a reversed sub-field orwildcardfield. - Fuzzy on long, common words — huge expansion sets; always set
prefix_lengthandmax_expansions. - Analyzed vs raw —
wildcardis a term-level query; run it on akeywordsub-field, not analyzedtext, or the glob won’t align with stored tokens. fuzziness≠ phonetic — it’s character edits, not sound; “fone”→“phone” (2 edits) may miss. Use a phonetic token filter for that.