Hash tables
A hash table maps keys → values by hashing each key to a bucket index in a backing array. Average O(1) insert/lookup/delete; O(n) worst case when everything collides. The two design choices that define one are the hash function and the collision strategy.
Why it matters
Hash tables are the workhorse of practical programming — dictionaries, sets, caches, database indexes, and deduplication all lean on them. “Use a hash map” is the right first answer to a huge fraction of “make this faster” problems, precisely because it turns O(n) scans into O(1) lookups.
How it works
bucket = hash(key) mod capacity, then store the entry there. Two keys can land in the same bucket
— a collision — and the strategy for that is the whole game:
- Separate chaining — each bucket holds a list (or tree) of entries. Simple, degrades
gracefully. Java’s
HashMapupgrades a long chain to a red-black tree. - Open addressing — on collision, probe for another slot in the same array:
- Linear probing (
i+1) — cache-friendly, but suffers primary clustering. - Quadratic probing (
i+k²) — reduces clustering. - Double hashing — a second hash decides the step size; best distribution.
- Linear probing (
Load factor α = n / buckets controls the space/time trade-off. Chaining stays usable past
α = 1; open addressing must keep α well below 1 (resize around 0.7). A resize allocates a
bigger table and rehashes every entry — amortized O(1) because the table doubles geometrically
(the same argument behind dynamic array growth).
| Operation | Average | Worst |
|---|---|---|
| Insert / lookup / delete | O(1) | O(n) |
Example
function get(table, key):
b ← hash(key) mod table.capacity
for (k, v) in table.buckets[b]: # separate chaining
if k == key: return v
return NOT_FOUND
A from-scratch hash map (open addressing vs chaining, then benchmark) is in this track’s Project ideas.
Pitfalls
- Relying on iteration order — it’s unspecified; never depend on it.
- HashDoS — an attacker who can choose keys can force every insert to collide, turning O(1) into O(n). Use randomized/seeded hashes for untrusted input.
- Mutating a key after insertion — its hash changes and the entry becomes unreachable. Keys must be effectively immutable.