Where to cache (client, CDN, web server, database, application)

A cache is any layer that serves a saved copy so work isn’t repeated — and you can place one at every hop from the user’s device down to the database.

Why it matters

Each layer you add shortens the path: the closer the cache sits to the client, the lower the latency and the more origin load it absorbs. A request answered by the browser or CDN never touches your servers, so caching is the cheapest scalability lever you own — but every copy is a chance to serve something stale, so placement is a deliberate trade-off, not a free win.

How it works

Think of caches as a layered hierarchy; a hit at any tier stops the request from going deeper:

LayerStoresScopeTTL feelInvalidation
Clientfull responses, assetsone usermins–dayshard (can’t reach it)
CDNstatic + cacheable HTMLper-PoPmins–hourspurge API
Web serverrendered pages, fragmentsone nodesecs–minslocal, easy
Applicationobjects, query resultsshared (Redis)secs–minsyou control it
Databasebuffer pool, query cacheone DBengine-managedautomatic
  • Client / HTTP caching is driven by Cache-Control, ETag, and Last-Modified; the browser revalidates with a conditional GET and gets a cheap 304.
  • Application cache (e.g. Redis/Memcached) is the workhorse — a shared key→value store you populate via cache-aside.
  • Database cache is the buffer pool keeping hot pages in RAM; you tune it, you don’t query it directly.

Example

A product page request, hot path first:

browser cache?  hit → 0 network    (Cache-Control: max-age=300)
CDN PoP?        hit → ~10 ms edge   (static assets, cached HTML)
web/app cache?  hit → ~1 ms Redis   (product JSON, key=product:42)
miss everywhere → DB read ~20 ms → backfill Redis + return

One DB read can now serve thousands of users for the next 5 minutes.

Pitfalls

  • Caching user-specific data at a shared layer. A CDN or proxy caching a logged-in page can leak one user’s data to another — gate with Cache-Control: private.
  • No Vary header. Caching one variant (gzip, language) and serving it to clients that needed another.
  • Stacking TTLs blindly. A 1h browser TTL on top of a 1h CDN TTL means a fix can take ~2h to reach everyone.
  • Caching errors. A cached 500 or 404 outlives the bug that caused it; never cache failures with a long TTL.

See also