You have a page that's slow. Profiling shows why: every time someone loads it, your server runs a heavy database query — joining a few tables, counting things, sorting. It takes 400 milliseconds. And ten thousand people load that page an hour, so you're running that same expensive query ten thousand times, computing the same answer over and over.
That repetition is the clue. If the answer barely changes between requests, why compute it every single time?
The idea: compute once, reuse the answer
A cache is a fast place to stash an answer so you can hand it back without redoing the work. The classic one is an in-memory store like Redis — a separate service that holds key-value pairs in RAM and returns them in well under a millisecond.
The flow most apps use is called cache-aside, and it reads exactly how you'd guess:
Check the cache first
A request comes in. Ask the cache: do you have the answer for this page?
Hit: return it
If yes (a "cache hit"), return the stored answer. No database query at all. Sub-millisecond.
Miss: do the work, then store it
If no (a "cache miss"), run the slow 400ms query, send the answer back to the user, and also put it in the cache so the next request is a hit.
The first visitor pays the 400ms. Everyone in the next stretch gets the answer instantly, and your database goes from ten thousand heavy queries an hour to a handful. This is the single biggest speed lever in most systems, and it's genuinely easy to add.
So if it's this good and this easy, where's the catch? The catch is everything that happens after you store the answer.
The hard half: the copy goes stale
The moment you keep a copy of something, you have two versions of the truth: the real one in the database, and the copy in the cache. The instant the real one changes, your copy is stale — it's lying.
Say you cached a product's price. Then someone updates that price in the database. Your cache still has the old number, and it will keep happily serving the old price to every customer until something tells it to stop. Now you're selling at the wrong price. The cache that made you fast just made you wrong.
There are only two hard things in computer science: cache invalidation and naming things. — a very old programmer joke, repeated because it keeps being true.
The hard thing isn't keeping a copy. It's knowing when the copy is no longer true and getting rid of it at the right moment. That job is called cache invalidation, and it has two common strategies.
Strategy one: let it expire (TTL)
The simplest approach: when you store something, attach a time to live (TTL) — say 60 seconds. After that, the cache throws the value away on its own, and the next request misses and refetches fresh data.
DecisionGive cached data a TTL and accept it can be stale for up to that long.
A TTL is wonderfully simple — you never have to track what changed, you just let things age out. The cost is a window of staleness: with a 60-second TTL, a price change can show the old value for up to a minute. You tune the TTL to the data. A like count can be stale for minutes and nobody's hurt. A price might want seconds. Money in a bank balance can't be stale at all, so don't cache it like this.
TTLs are the right default for the huge category of data where "a little out of date is fine."
Strategy two: actively clear it on change
When you genuinely can't tolerate staleness, you don't wait for a timer. You delete (or update) the cached value at the exact moment the underlying data changes. So in your code, right after the step that updates the product's price in the database, you add a step that removes that product's cached entry. The next read misses and refetches the new price. No stale window.
This is more precise and more work. You have to find every place that changes the data and remember to clear the cache there too. Miss one, and you get a stale value that never expires — the worst kind of cache bug, because it looks random and persists. This is exactly why the joke calls invalidation hard: the logic is spread across your whole codebase, and forgetting one spot is silent.
TTL (expire on a timer)
Simple, self-healing, no tracking of changes. Good for data that tolerates being a little old: feeds, counts, listings, search results. The price is a bounded staleness window.
Active invalidation (clear on write)
Precise, no staleness window. But you must clear the cache everywhere the data can change, and a missed spot causes a stale value that lingers. Use it when correctness matters more than simplicity, and keep the set of writers small.
Plenty of systems use both: a TTL as a safety net (so even a missed invalidation self-corrects eventually) plus active clearing on the writes you know about.
There's a stricter cousin of active clearing worth knowing by name. With write-through, every write goes to the cache and the database together, so the cache is never behind. It removes the staleness window entirely, but it slows every write down to cache speed and fills the cache with data nobody may ever read. Cache-aside, where you only populate the cache on a read miss, is the default for a reason: most data is read far more than it's written, and you'd rather not pay to cache the rest. Reach for write-through only when reads of a value reliably follow its writes and you can't tolerate even a brief stale read.
The failure mode that bites at scale: the stampede
Here's one more, because it's a classic and it's counterintuitive. Imagine a very popular cached value — the homepage feed, cached, served to a hundred thousand people a second. Its TTL expires.
In that instant, the next hundred thousand requests all check the cache, all miss (it just expired), and all decide to run the slow query to refill it. A hundred thousand copies of your heaviest query hit the database at the same moment. The database falls over. The thing the cache existed to protect gets killed by the cache expiring. This is a cache stampede (or "thundering herd").
The fix: let one request do the refill
The standard defence is to let only the first request rebuild the value while everyone else either waits for that single rebuild or keeps serving the slightly-stale old value. One database query instead of a hundred thousand. This is the same "request coalescing" (or single-flight) idea a CDN uses to protect an origin server. A common refinement is stale-while-revalidate: keep handing out the old value to everyone and refresh it in the background, so nobody ever waits and the database sees exactly one refill. It also helps to add a little random jitter to TTLs so a batch of keys cached at the same moment don't all expire on the same tick. Whenever a cached value is hot enough that everyone wants it at once, plan for the moment it expires.
Where this leaves you
Caching is the rare optimization that's easy to add and hard to live with. The mental model to keep:
The same expensive work repeats
That repetition is the signal to cache. Compute once, reuse the answer.
Use cache-aside
Check the cache; on a miss, do the work and store it; on a hit, skip the database entirely.
Now your copy can go stale
Decide your invalidation strategy: a TTL for data that tolerates being a little old, active clearing for data that doesn't, often both.
Plan for the hot key expiring
Stop a stampede by letting one request rebuild a hot value while others wait or serve the old one.
The one idea to take away
A cache is a promise that a copy is still true. Adding the cache is the easy 10%; keeping that promise — knowing when the copy has gone stale and clearing it at the right moment — is the 90% where every cache bug lives. Cache the data that's allowed to be a little wrong, be deliberate about the data that isn't, and never forget the moment a popular key expires.
Test yourself
Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.
QWalk through the cache-aside pattern on a cache miss.+
A request comes in and checks the cache for the answer. On a miss (it's not there), the server runs the real work — say the slow database query — and returns the result to the user. Crucially, it also stores that result in the cache before finishing, so the next request for the same thing is a hit and skips the database. The first request pays the cost; subsequent ones ride free until the entry expires or is cleared.
QWhy is cache invalidation considered one of the hard problems?+
Because the moment you keep a copy, the real data can change underneath it, making your copy silently wrong (stale). Knowing exactly when a copy is no longer true, and clearing it at the right moment, is genuinely hard: with active invalidation the clearing logic is scattered across every place that writes the data, and forgetting just one spot causes a stale value that never self-corrects and looks like a random bug.
QWhen would you use a TTL versus actively clearing the cache on a write?+
Use a TTL when the data tolerates being a little out of date (feeds, counts, listings) — it's simple and self-healing, at the cost of a bounded staleness window. Use active clearing when staleness is unacceptable and you need the change visible immediately, accepting that you must clear the cache everywhere the data can change. Many systems combine them: active clearing for known writes, plus a TTL as a safety net so missed invalidations eventually self-correct.
QWhat is a cache stampede, and how do you prevent it?+
A stampede happens when a hot cached value expires and a flood of simultaneous requests all miss at once, all decide to rebuild it, and all hit the database together — overwhelming the very database the cache was protecting. The fix is to let only the first request rebuild the value while the others wait for that single rebuild or briefly serve the old value, turning a hundred thousand queries back into one.
QWhy shouldn't you cache something like an account balance with a 60-second TTL?+
Because a TTL accepts that the value can be stale for up to its lifetime, and a balance that's wrong for a minute can let someone overspend or see incorrect money — a real correctness and trust failure, not a cosmetic one. Data where staleness causes harm should either not be cached this way, or use active invalidation so it's never served stale. Match the strategy to how much wrongness the data can tolerate.
Comments
Loading comments…