Skip to content

Consistent Hashing

You have a cache spread across N servers, and you need to decide which server holds a given key. The obvious answer — server = hash(key) % N — is correct, fast, and a trap. It works flawlessly right up until the day N changes, and then it detonates. Consistent hashing is the fix, and like most great distributed-systems ideas, it’s an idea about minimizing change rather than an idea about cleverness.

Start with the naive scheme. Three cache servers, modulo placement:

key "alice" → hash=17 → 17 % 3 = 2 → server 2
key "bob" → hash=22 → 22 % 3 = 1 → server 1
key "carol" → hash=30 → 30 % 3 = 0 → server 0

Now one server dies, or you add a fourth to handle load. N goes from 3 to 4 (or to 2). Recompute:

N=3 N=4
key "alice" → 17%3=2 → 17%4=1 MOVED
key "bob" → 22%3=1 → 22%4=2 MOVED
key "carol" → 30%3=0 → 30%4=2 MOVED

Almost every key maps to a different server. For a cache, that means a near-total miss storm: the data is still in the cluster, but nobody can find it at the new address, so every request falls through to the database at once. For a sharded database, it means physically moving nearly all your data across the network. Changing N by one shouldn’t require touching N-1/N of your keys — but modulo hashing does exactly that. The scheme couples the address of a key to the exact count of servers, and that coupling is the disease.

Consistent hashing breaks the coupling with one move: hash the servers onto the same space as the keys. Imagine the output of your hash function bent into a circle — a ring of, say, 0 to 2³²-1 that wraps around. Place each server on the ring by hashing its identifier. Place each key on the ring by hashing the key. Then the rule is:

A key belongs to the first server you meet walking clockwise from the key’s position.

0 / 2^32
S3 ● │ ● S1
╲ │ ╱
╲ │ ╱
keyB ○───────╲ │ ╱───────○ keyA → walks clockwise to S1
╲│╱
────────────●────────────
╱│╲ S2
╱ │ ╲
╱ │ ╲
keyC ○ │ (keyC → walks clockwise to S3)

Each server “owns” the arc of the ring between itself and the previous server going counter-clockwise. Now watch what happens when a server leaves:

S2 dies. Only the keys in S2's arc are affected — they walk
clockwise to the NEXT server (S1). Every other key stays put.

Adding a server is the mirror image: a new node inserted on the ring steals only the slice of keys between it and its clockwise neighbor. On average, a membership change relocates just K/N keys (one server’s share) instead of nearly all of them. That is the whole point: the blast radius of a change is one node’s worth of keys, not the entire keyspace.

The ring as described has two flaws, both rooted in the same cause — you only get a handful of random points on the circle.

  1. Uneven load. Three random points rarely cut a circle into three equal arcs. One server might own 50% of the ring by bad luck, becoming a hotspot while others idle.
  2. Uneven failover. When a server dies, all of its keys dump onto its single clockwise neighbor, doubling that one node’s load instead of spreading the orphaned keys across the cluster.

The fix is virtual nodes (vnodes): instead of placing each physical server at one point, place it at many points — hash S1#1, S1#2, … S1#200 and scatter all 200 around the ring.

Without vnodes: With vnodes (each server = many small arcs):
●S1 ●S2 ·S1 ·S3 ·S2 ·S1 ·S2 ·S3 ·S1 ·S2 ·S3 ·S1
●S3 (interleaved all around the ring)
3 big lumpy arcs hundreds of tiny arcs, ~even per server

Now each physical server owns hundreds of tiny arcs sprinkled everywhere. The law of large numbers takes over: total load per server converges to near-even, and when a server dies its many small arcs spill onto many different neighbors, so the orphaned load spreads across the whole cluster instead of crushing one node. Vnodes also let you give a beefier machine more virtual points, weighting it to take a larger share — heterogeneity for free.

Consistent hashing is the placement engine underneath a surprising amount of infrastructure: distributed caches (memcached client libraries, many Redis cluster setups), Dynamo-style databases (Amazon DynamoDB, Cassandra, Riak), CDN edge selection, and request load balancers that want session affinity (the same user keeps landing on the same backend, and a backend leaving only reshuffles its own users). Anywhere you must map a huge keyspace onto a changing set of nodes and you care about minimizing movement on change, this is the tool. It’s the natural partner to everything in Partitioning & Sharding — consistent hashing is one specific, churn-minimizing answer to the “which shard owns this key?” question, and the one most clusters that resize dynamically reach for.

The benefit is precise: a membership change moves one node’s share of keys instead of nearly all of them — bounded, predictable churn that turns cluster resizing from a catastrophe into a routine operation. The costs are real too: you maintain a ring data structure (a sorted map you binary-search per lookup), vnodes multiply your metadata, and you give up the dead-simple, stateless % N arithmetic for something you must implement, test, and keep consistent across every client. As ever, no free lunch — but for a cluster that grows, shrinks, and loses nodes, it’s a bargain.

  1. Walk through why hash(key) % N relocates almost every key when N changes by one, and quantify the fraction that moves.
  2. State the clockwise placement rule, and explain why only one node’s worth of keys move when a server joins or leaves the ring.
  3. What two distinct problems do virtual nodes solve, and what is the underlying cause both share?
  4. Why does adding more virtual nodes improve balance, and what is the cost of pushing that number very high?
  5. How would you use vnodes to make a cluster of unequal machines (some with twice the RAM) take load proportional to capacity?