Consistent Hashing
You have a cache spread across N servers, and you need to decide which server holds a given key.
The obvious answer — server = hash(key) % N — is correct, fast, and a trap. It works flawlessly
right up until the day N changes, and then it detonates. Consistent hashing is the fix, and like
most great distributed-systems ideas, it’s an idea about minimizing change rather than an idea
about cleverness.
The rehashing catastrophe
Section titled “The rehashing catastrophe”Start with the naive scheme. Three cache servers, modulo placement:
key "alice" → hash=17 → 17 % 3 = 2 → server 2key "bob" → hash=22 → 22 % 3 = 1 → server 1key "carol" → hash=30 → 30 % 3 = 0 → server 0Now one server dies, or you add a fourth to handle load. N goes from 3 to 4 (or to 2). Recompute:
N=3 N=4key "alice" → 17%3=2 → 17%4=1 MOVEDkey "bob" → 22%3=1 → 22%4=2 MOVEDkey "carol" → 30%3=0 → 30%4=2 MOVEDAlmost every key maps to a different server. For a cache, that means a near-total miss storm: the
data is still in the cluster, but nobody can find it at the new address, so every request falls
through to the database at once. For a sharded database, it means physically moving nearly all your
data across the network. Changing N by one shouldn’t require touching N-1/N of your keys — but
modulo hashing does exactly that. The scheme couples the address of a key to the exact count of
servers, and that coupling is the disease.
The hash ring
Section titled “The hash ring”Consistent hashing breaks the coupling with one move: hash the servers onto the same space as the keys. Imagine the output of your hash function bent into a circle — a ring of, say, 0 to 2³²-1 that wraps around. Place each server on the ring by hashing its identifier. Place each key on the ring by hashing the key. Then the rule is:
A key belongs to the first server you meet walking clockwise from the key’s position.
0 / 2^32 │ S3 ● │ ● S1 ╲ │ ╱ ╲ │ ╱ keyB ○───────╲ │ ╱───────○ keyA → walks clockwise to S1 ╲│╱ ────────────●──────────── ╱│╲ S2 ╱ │ ╲ ╱ │ ╲ keyC ○ │ (keyC → walks clockwise to S3)Each server “owns” the arc of the ring between itself and the previous server going counter-clockwise. Now watch what happens when a server leaves:
S2 dies. Only the keys in S2's arc are affected — they walkclockwise to the NEXT server (S1). Every other key stays put.Adding a server is the mirror image: a new node inserted on the ring steals only the slice of keys between it and its clockwise neighbor. On average, a membership change relocates just K/N keys (one server’s share) instead of nearly all of them. That is the whole point: the blast radius of a change is one node’s worth of keys, not the entire keyspace.
The balance problem, and virtual nodes
Section titled “The balance problem, and virtual nodes”The ring as described has two flaws, both rooted in the same cause — you only get a handful of random points on the circle.
- Uneven load. Three random points rarely cut a circle into three equal arcs. One server might own 50% of the ring by bad luck, becoming a hotspot while others idle.
- Uneven failover. When a server dies, all of its keys dump onto its single clockwise neighbor, doubling that one node’s load instead of spreading the orphaned keys across the cluster.
The fix is virtual nodes (vnodes): instead of placing each physical server at one point, place it
at many points — hash S1#1, S1#2, … S1#200 and scatter all 200 around the ring.
Without vnodes: With vnodes (each server = many small arcs): ●S1 ●S2 ·S1 ·S3 ·S2 ·S1 ·S2 ·S3 ·S1 ·S2 ·S3 ·S1 ●S3 (interleaved all around the ring) 3 big lumpy arcs hundreds of tiny arcs, ~even per serverNow each physical server owns hundreds of tiny arcs sprinkled everywhere. The law of large numbers takes over: total load per server converges to near-even, and when a server dies its many small arcs spill onto many different neighbors, so the orphaned load spreads across the whole cluster instead of crushing one node. Vnodes also let you give a beefier machine more virtual points, weighting it to take a larger share — heterogeneity for free.
Where you’ll actually meet it
Section titled “Where you’ll actually meet it”Consistent hashing is the placement engine underneath a surprising amount of infrastructure: distributed caches (memcached client libraries, many Redis cluster setups), Dynamo-style databases (Amazon DynamoDB, Cassandra, Riak), CDN edge selection, and request load balancers that want session affinity (the same user keeps landing on the same backend, and a backend leaving only reshuffles its own users). Anywhere you must map a huge keyspace onto a changing set of nodes and you care about minimizing movement on change, this is the tool. It’s the natural partner to everything in Partitioning & Sharding — consistent hashing is one specific, churn-minimizing answer to the “which shard owns this key?” question, and the one most clusters that resize dynamically reach for.
What it buys, what it costs
Section titled “What it buys, what it costs”The benefit is precise: a membership change moves one node’s share of keys instead of nearly all
of them — bounded, predictable churn that turns cluster resizing from a catastrophe into a routine
operation. The costs are real too: you maintain a ring data structure (a sorted map you binary-search
per lookup), vnodes multiply your metadata, and you give up the dead-simple, stateless % N
arithmetic for something you must implement, test, and keep consistent across every client. As ever,
no free lunch — but for a cluster that grows, shrinks, and loses nodes, it’s a bargain.
Check your understanding
Section titled “Check your understanding”- Walk through why
hash(key) % Nrelocates almost every key whenNchanges by one, and quantify the fraction that moves. - State the clockwise placement rule, and explain why only one node’s worth of keys move when a server joins or leaves the ring.
- What two distinct problems do virtual nodes solve, and what is the underlying cause both share?
- Why does adding more virtual nodes improve balance, and what is the cost of pushing that number very high?
- How would you use vnodes to make a cluster of unequal machines (some with twice the RAM) take load proportional to capacity?