Hot Partitions & the Celebrity Problem

You sharded your database perfectly. A hundred partitions, each holding 1% of the keys, each on its own machine. On paper, you can serve a hundred times the load of a single box. Then a pop star posts to her 200-million-follower account, and every read for that one row lands on one partition. That partition is now doing the work of the entire system; the other ninety-nine sit nearly idle. Your average utilization looks fine. The one machine on fire does not care about your average.

This is the hot partition problem, and its most famous instance is the celebrity problem. It is the single most common way a “horizontally scalable” system stops scaling.

Why sharding assumes a uniform world

Partitioning & Sharding buys you throughput by spreading keys across machines. But that bargain rests on a hidden assumption: load is roughly uniform across keys. Real workloads almost never are. They follow a power law — a Zipfian distribution where a tiny fraction of keys attract most of the traffic.

requests
  │ █
  │ █
  │ █ █
  │ █ █ █
  │ █ █ █ █ ▄ ▄ ▁ ▁ ▁ ▁ ▁ ▁ ▁ ▁ ▁ ...
  └────────────────────────────────────
    the "head"          the long tail
   (celebrities)     (everyone else)

The math of consistent hashing distributes keys evenly. It says nothing about distributing load, because it cannot see how popular each key is. A perfectly balanced key-space can have a wildly unbalanced request-space. The hot key isn’t a bug in your hashing — it’s a property of your users.

Detecting the hotspot

You cannot fix what you cannot see, and aggregate dashboards actively hide hotspots — a system at 40% average CPU can have one node pinned at 100%. Detection means looking at the distribution, not the mean:

Per-partition metrics. Track requests/sec, CPU, and queue depth per shard, and alert on the max, not the average. The gap between p50 and max node utilization is your skew signal.
Top-K key sampling. Sample request keys and maintain a heavy-hitters list (a Count-Min sketch is perfect here). If one key is 30% of traffic, you’ve found your celebrity.
Tail latency on one shard. A single hot partition shows up as a spike in p99 latency for requests that happen to route to it.

Mitigations: spread the heat

There is no single fix; you assemble a toolkit and apply the cheapest one that works.

1. Cache the hot key

The easiest win. Hot keys are, by definition, read by everyone — so they’re perfectly cacheable. A small in-memory cache in front of the celebrity’s row can absorb 99% of reads before they ever touch the partition. The hotter the key, the higher the cache hit rate. (Mind the failure mode this creates: see Cache Stampede for what happens when that hot entry expires.)

2. Replicate the hot key

If one replica can’t serve the reads, make more of them. Promote the hot key to N read replicas and load-balance across them. Reads scale with replica count; the trade-off is that you’ve now de-uniformed your storage — this one key costs N× the space and N× the replication traffic.

3. Split the key

When writes are hot (not just reads), caching and replicas don’t help — you can’t cache a write. Instead, shard the key itself. A celebrity’s follower count becoming a write bottleneck? Split the counter into celebrity_id:0 … celebrity_id:9, write to a random shard, and sum on read.

WRITE path                         READ path
increment one of:                  read + sum all:
  celeb:42#0  ──┐                    celeb:42#0  ──┐
  celeb:42#1  ──┼─ random pick       celeb:42#1  ──┼─ fan-in,
  celeb:42#2  ──┘                    celeb:42#2  ──┘  total = Σ

You’ve traded a hot single key for ten lukewarm keys. The cost is read complexity (you fan out and aggregate) and approximate consistency (the sum may lag). For counters and aggregates, that’s almost always an acceptable trade.

4. Fan-out on write vs. read

The celebrity problem in social feeds has its own canonical fix. Fan-out on write (push a post into every follower’s feed) is cheap to read but catastrophic for celebrities — one post triggers 200M writes. Fan-out on read (assemble the feed by pulling from followed accounts at read time) is cheap to write but expensive to read. Mature systems go hybrid: fan-out on write for normal users, fan-out on read for celebrities, merged at query time.

What does this buy us, and what does it cost?

Every mitigation here buys survival under skew — the system keeps serving when one key goes viral. The cost is always uniformity and simplicity: you now have special-case code paths, non-uniform storage, eventual consistency on hot aggregates, and more moving parts to operate. The discipline is to apply these only to the keys that need them. Treating every key like a celebrity wastes resources; treating no key like one guarantees an outage the day you go viral.

Check your understanding

Why can a system at 40% average CPU still be falling over? What metric would reveal the problem?
Consistent hashing distributes keys evenly. Why doesn’t that prevent hot partitions?
When does caching a hot key fail to help — and which mitigation do you reach for instead?
Explain the read/write trade-off in key-splitting a celebrity’s follower counter into 10 shards.
Why do large social platforms use a hybrid fan-out strategy instead of pure write or pure read?