Where to cache (client, CDN, web server, database, application)
A cache is any layer that serves a saved copy so work isn’t repeated — and you can place one at every hop from the user’s device down to the database.
Why it matters
Each layer you add shortens the path: the closer the cache sits to the client, the lower the latency and the more origin load it absorbs. A request answered by the browser or CDN never touches your servers, so caching is the cheapest scalability lever you own — but every copy is a chance to serve something stale, so placement is a deliberate trade-off, not a free win.
How it works
Think of caches as a layered hierarchy; a hit at any tier stops the request from going deeper:
| Layer | Stores | Scope | TTL feel | Invalidation |
|---|---|---|---|---|
| Client | full responses, assets | one user | mins–days | hard (can’t reach it) |
| CDN | static + cacheable HTML | per-PoP | mins–hours | purge API |
| Web server | rendered pages, fragments | one node | secs–mins | local, easy |
| Application | objects, query results | shared (Redis) | secs–mins | you control it |
| Database | buffer pool, query cache | one DB | engine-managed | automatic |
- Client / HTTP caching is driven by
Cache-Control,ETag, andLast-Modified; the browser revalidates with a conditionalGETand gets a cheap304. - Application cache (e.g. Redis/Memcached) is the workhorse — a shared key→value store you populate via cache-aside.
- Database cache is the buffer pool keeping hot pages in RAM; you tune it, you don’t query it directly.
Example
A product page request, hot path first:
browser cache? hit → 0 network (Cache-Control: max-age=300)
CDN PoP? hit → ~10 ms edge (static assets, cached HTML)
web/app cache? hit → ~1 ms Redis (product JSON, key=product:42)
miss everywhere → DB read ~20 ms → backfill Redis + return
One DB read can now serve thousands of users for the next 5 minutes.
Pitfalls
- Caching user-specific data at a shared layer. A CDN or proxy caching a logged-in page can leak one user’s data to another — gate with
Cache-Control: private. - No
Varyheader. Caching one variant (gzip, language) and serving it to clients that needed another. - Stacking TTLs blindly. A 1h browser TTL on top of a 1h CDN TTL means a fix can take ~2h to reach everyone.
- Caching errors. A cached
500or404outlives the bug that caused it; never cache failures with a long TTL.