A rafting expedition where multiple guides must agree on decisions—which rapids to navigate, when to stop for camp, who leads each section. Without consensus the expedition fragments. Raft consensus works like this expedition: guides elect a leader who makes decisions until they lose the crew’s confidence, then a new election ensures someone is always in charge.
The Expedition Challenge
The Split-Brain Problem
Multiple rafts think they’re leading. Conflicting route decisions, some take left fork, others take right fork. Without consensus, groups diverge.
The Disappearing Guide
Lead guide’s radio dies. Are they lost or just out of range? When to elect a new leader? How to handle their return? Leadership gaps paralyze progress.
The Democratic Expedition
Raft organizes leadership through three roles: follower, candidate, and leader.
The Election Process
When a follower receives no heartbeat from leader within its randomized election timeout, it becomes a candidate and requests votes.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
Timeout Trigger: No heartbeat from leader Candidacy: Follower requests votes Majority Wins: Need >50% of votes New Term: Leadership period begins
The Term System
Terms are numbered leadership periods. Each term has at most one leader. Higher terms override lower terms. Terms create a total ordering of leadership history.
Heartbeat Mechanism
Leader proves liveness by sending periodic heartbeats to followers. If a follower misses heartbeats beyond its election timeout, it starts a new election.
Real-World Applications
Distributed Database
Write request arrives at leader. Leader appends to log, sends AppendEntries to followers. When majority acknowledges, write is committed. All replicas stay synchronized.
Container Orchestration
Kubernetes uses etcd (Raft-based) for cluster state. Leader decides scheduling, followers mirror state, new election on leader failure, cluster continues running.
Configuration Management
Consul service discovery tracks healthy services. Leader updates service registry, changes propagate to all nodes, failover to new leader if needed.
The Log Replication Dance
The Expedition Journal
Leader maintains an ordered log of commands:
- “Take left fork at mile 5”
- “Camp at sandy beach”
- “Scout rapids before running”
Replicating Entries
Leader sends AppendEntries to followers. Followers write to their logs and acknowledge. When majority has written, entry commits. Followers apply to their state machines.
Handling Inconsistencies
After leader crash, followers may have divergent logs. Leader’s log wins. Leader finds last matching entry, deletes follower diverged entries, sends correct entries.
Advanced Features
Configuration Changes
Joint consensus allows adding/removing servers safely. Overlap period requires both old and new majorities to agree. Single-server changes work atomically.
Log Compaction
Logs grow indefinitely. Snapshot current state, delete old log entries. Lagging followers receive snapshots to catch up.
Read Consistency
Read from leader ensures fresh reads. Lease reads trade consistency for lower latency. ReadIndex confirms leadership before reading.
Pre-Vote Optimization
Candidate checks if it can win before actually starting election. Reduces disruption during network partitions.
Common Challenges
Split Vote
Multiple candidates split votes, no majority wins. Randomization of election timeouts breaks ties. Candidate with earlier timeout typically wins.
Slow Follower
Network issues cause a follower to lag. Leader must decide how long to wait before considering the follower too slow.
Network Partition
Group splits into minorities. Minority cannot elect leader, continues with stale data. When partition heals, higher term wins.
Log Divergence
A failed leader made uncommitted decisions. New leader with different decisions may have followers with conflicting histories. New leader’s log wins.
Implementation Components
Core State
Persistent: current term, voted for, log entries. Volatile: commit index, last applied index.
The Three RPCs
RequestVote: Request votes during election AppendEntries: Replicate log entries, heartbeat InstallSnapshot: Transfer snapshot to lagging followers
Testing
Network delays, message loss, clock skew. Deterministic testing with controlled randomness is essential.
When Raft Works
Raft fits:
- Systems needing consistent state across replicas
- Teams wanting understandable consensus
- Fault-tolerant distributed systems
Raft struggles with:
- Geo-distributed deployments with high latency
- Write-heavy workloads (consensus overhead)
- Very large clusters (consensus scales poorly beyond ~7 nodes)
Decision Rules
Use Raft when:
- You need strong consistency
- Correctness matters more than raw performance
- You want implementable consensus
- Cluster size stays small
Consider alternatives:
- Paxos for theoretical foundations or formal verification
- Chain-based replication for write-heavy workloads
- Eventually consistent systems when strict consistency isn’t needed
The river flows. The expedition needs leadership. The vote proceeds. Consensus emerges.