Cache Stampede & the Thundering Herd
Caching is supposed to protect your database. But there’s a failure mode where the cache becomes the trigger for an outage: a single popular key expires, and in the microseconds before it’s repopulated, every request for it misses the cache simultaneously and slams the origin at once. That synchronized surge is a cache stampede (a thundering herd), and it has taken down systems that were running comfortably a moment earlier.
How a calm system suddenly falls over
Section titled “How a calm system suddenly falls over”Imagine a key served 10,000 times/second from cache, with the database never seeing that load. The key has a TTL. The instant it expires:
t=0 cache has "homepage" → 10,000 req/s served from cache, DB idle t=TTL "homepage" EXPIRES t=TTL+ε 10,000 concurrent requests all MISS → all 10,000 hit the DB at once DB melts → slow responses → cache stays empty → more pile on → cascadeThe database wasn’t sized for 10,000 simultaneous identical queries because it never saw them — the cache absorbed them. Expiry removes that shield for a moment, and the herd tramples through. What did the cache buy us, and what did it cost? It bought enormous load reduction, but it quietly created a correlated failure at the expiry instant.
Fix 1: Locking / single-flight (request coalescing)
Section titled “Fix 1: Locking / single-flight (request coalescing)”Let only one request recompute the value; everyone else waits for that result or briefly serves stale data.
key misses → first request acquires a lock, recomputes, repopulates cache meanwhile → other requests see the lock → wait, or return last-known value → exactly ONE query hits the DB instead of 10,000This is often called single-flight (coalesce concurrent identical loads into one). It’s the most direct fix — the cost is added coordination and deciding what the waiters do (block vs serve stale).
Fix 2: Stale-while-revalidate (early/async recompute)
Section titled “Fix 2: Stale-while-revalidate (early/async recompute)”Don’t wait for hard expiry. Serve the stale value immediately while a background task refreshes it, or recompute probabilistically before the TTL so the refresh is spread out rather than synchronized.
key near expiry → serve current (slightly stale) value NOW → trigger ONE async refresh in the background → users never see a miss; the DB sees one refresh, not a herdThe trade-off: you accept serving data that’s briefly stale in exchange for never exposing a cold-cache window. For most read-heavy content (feeds, product pages) that’s an easy yes.
Fix 3: Jittered / staggered TTLs
Section titled “Fix 3: Jittered / staggered TTLs”A subtler cause: if many keys are populated together (e.g. a cache warm-up or a deploy) with the same TTL, they all expire at the same instant — a herd across many keys at once. Add random jitter to each TTL so expirations spread out.
BAD: every key TTL = 3600s → all expire together → synchronized stampede GOOD: TTL = 3600s ± random(0..300s) → expirations smeared across 5 minutesThis one-line change (randomize the TTL) prevents the correlated expiry that turns thousands of independent keys into a single thundering herd. It’s the cheapest fix and worth doing by default.
The thread
Section titled “The thread”What does this buy us, and what does it cost? A cache buys massive origin offload — but if you ignore the expiry instant, it quietly concentrates load into a synchronized spike that’s worse than no cache. The fixes (single-flight, stale-while-revalidate, jittered TTLs) cost a little staleness and coordination to buy a smooth, herd-free load curve. The lesson generalizes: any time many actors share a deadline, expect them to act in unison — and design to spread them out. (See also Caching Strategies.)
Check your understanding
Section titled “Check your understanding”- Why can a database that was idle suddenly be overwhelmed the moment a single hot key expires?
- How does single-flight / request coalescing reduce the load from a cache miss on a popular key?
- What does stale-while-revalidate trade away, and what does it guarantee in return?
- Why do identical TTLs cause a stampede across many keys, and how does jitter fix it?
- What is cache penetration, and how does caching “not found” or a Bloom filter defend against it?