Complex Data Structures

The specialized structures — union-find, Fenwick/segment trees, balanced trees, tries, suffix structures — that turn an O(n) per-query problem into O(log n) or O(α(n)) by precomputing structure over the data.

Why it matters

Once a problem asks “answer many queries over a changing collection,” a flat arrays-and-dynamic-arrays or hash-tables forces O(n) per query and the total blows up. These structures pay an O(n) or O(n log n) build cost once, then serve each query in sub-linear time. They are the backbone of databases (B-tree indexing), version control (Merkle/suffix-trees), networking routers (tries), and nearly every hard competitive problem. Picking the right one is a modeling skill, not memorization.

How it works

The choice is driven by the query type, not the data. Match the access pattern to the structure:

NeedStructureQueryBuild
Connectivity / merging setsdisjoint-set-union-find~O(α(n))O(n)
Prefix sum + point updatefenwick-treesO(log n)O(n)
Range query + range updatesegment-treesO(log n)O(n)
Ordered keys, balancedavl-trees, red-black-treesO(log n)O(n log n)
Prefix / dictionary lookuptriesO(L)O(ΣL)
Substring / pattern searchsuffix-arrays, suffix-treesO(m log n)/O(m)O(n)–O(n log n)
  • Most reduce a problem to a tree over index ranges or value ranges, so a query touches O(log n) nodes instead of n elements.
  • The recurring trade is build cost vs query cost: amortize the build only when queries are many.
  • Static data unlocks cheaper variants (sorted arrays-and-dynamic-arrays + searching beats a tree when nothing changes).

Example

10⁶ “is u connected to v?” queries on a graph with edges added over time. A BFS/graph-algorithms per query is ~10⁶ × 10⁶ = 10¹² ops — minutes to hours. disjoint-set-union-find answers each in near-constant time: ~10⁶ × 5 ≈ 5×10⁶ ops, milliseconds. Same problem, six orders of magnitude, purely from structure choice.

Pitfalls

  • Reaching for a heavy structure when sorted-array + searching suffices. If data is static, you often do not need a tree at all.
  • Ignoring the constant factor. A segment tree has ~4n nodes and pointer/recursion overhead; a Fenwick tree is ~10× simpler and faster when it fits the problem.
  • Building per query instead of once. The O(n) build must be hoisted out of the query loop or the asymptotic win evaporates.
  • Wrong structure for the update kind. Fenwick handles point updates trivially but range updates need a trick; default to segment trees when both ranges move.

See also