Imagine arranging pizza party guests on a circle, dividing it like pizza slices. Each station serves a section. When a guest leaves, only their immediate neighbors shift slightly. The rest stay where they are.
That’s consistent hashing. Instead of mapping guests to stations with modulo (hash mod N), you map them to positions on a circle. A station serves all guests from its position clockwise to the next station.
When a station leaves, its section distributes to one neighbor. When a station joins, it carves out a section from one neighbor. Only local changes.
The Modulo Problem
Traditional hashing uses modulo: guest N goes to station (N mod 4).
This works until stations change:
Station 3 closes. Now we have 3 stations.
Guest 3 was at Station 3, but Station 3 doesn’t exist. Guest 3 goes to Station 0 (3 mod 3 = 0).
Almost everyone moves. Cache misses spike. The system destabilizes.
The Circle
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
Rule: Each guest stands somewhere on the circle. Each station has a position. Guests are served by the first station encountered going clockwise.
Minimal Disruption
Station C (at 6 o’clock) closes:
- Carol (at 5 o’clock): Walks clockwise, reaches Station D. Moves.
- Dan (at 7 o’clock): Already going to Station D. No change.
- Everyone else: Unchanged.
Only guests between the departing station and its clockwise neighbor are affected.
Adding Stations
Station E wants to join at 4:30 position.
Guests between 3 o’clock (Station B) and 4:30 now go to Station E instead of B.
Guests between 4:30 and 6 o’clock still go to Station C.
Only one section changes.
Uneven Distribution
Random guest names may create imbalanced sections. One station gets 50 guests, another gets 7.
Solution: Virtual nodes. Each physical station claims multiple positions around the circle:
- Station A claims positions at 12, 1, 2, 5, 8 o’clock
- Station B claims positions at 3, 4, 6, 9, 11 o’clock
Better load distribution emerges from multiple smaller sections.
Hash Function Placement
Guest positioning: hash(guest_name) maps to a position on the circle.
- “Alice” → some number → position on circle
- “Bob” → different number → different position
- Alice always gets the same position
Station positioning: hash(station_identifier) maps to circle position.
- “Station_A” → position on circle
- “Station_A_replica_1” → another position
- “Station_A_replica_2” → yet another position
Implementation
Virtual Nodes
Each physical station spawns multiple virtual positions:
- Virtual_A_1 at 12 o’clock
- Virtual_A_2 at 2:30
- Virtual_A_3 at 5:15
- Virtual_A_4 at 8:45
Benefits: better distribution, smoother load balancing, graceful scaling.
Weighted Distribution
Capacity-based positioning:
Small Station S (100 pizzas/hour): 5 virtual positions Large Station L (500 pizzas/hour): 25 virtual positions Mega Station M (2000 pizzas/hour): 100 virtual positions
Traffic routes proportionally to capacity.
Use Cases
Amazon DynamoDB
Millions of data items (guests), thousands of storage nodes (stations). Consistent hashing for distribution, virtual nodes for balance.
Cassandra
Nodes around the world, data replicated to multiple positions, consistent hashing for placement, token ranges as pie slices.
Content Delivery Networks
Web content as guests, CDN servers as stations, consistent hashing for routing.
Hotspot Problem
Popular items may cluster at certain positions:
- “Login.html” gets millions of requests
- Single station overwhelmed despite good overall distribution
Solutions: Replicate popular items, multiple positions for hot content.
Cascading Failures
Station A fails, load goes to B. B overwhelmed, fails. Load goes to C. Chain reaction.
Prevention: circuit breakers, load shedding, capacity planning.
Decision Rules
Use consistent hashing when:
- Nodes join and leave frequently
- You need minimal disruption on changes
- Load distribution matters
Consider alternatives when:
- You need range queries (consistent hashing doesn’t support them well)
- Keys have natural ordering that matters
The next time you use a distributed database, cache, or CDN, remember the wheel spinning silently beneath, ensuring your request finds its destination with minimal disruption.