How a Cluster Agrees on Who's in Charge
Leader Election
Why consensus is needed, leader election the Raft way — terms, votes, and log replication — and how quorum prevents the dreaded split-brain.
What you'll learn
- Explain why distributed systems need a single elected leader
- Describe Raft's terms, voting, and log replication at a usable altitude
- Show how quorum prevents split-brain
The failover lesson ended on a question: when the primary dies, who decides which replica is promoted? If two replicas both decide “it’s me,” you have two primaries accepting conflicting writes — split-brain — and silent data corruption. Avoiding that requires the nodes to agree, reliably, even while messages are lost and machines crash. That agreement problem is consensus, and its most common job is electing a single leader.
Why you need a leader at all
Many coordination tasks are dramatically simpler with exactly one node in charge:
- A single writer avoids write conflicts — one node orders all writes.
- A single coordinator assigns work, holds locks, or owns a shard.
- A single source of truth for “what’s the current configuration?”
The catch: that one leader is a SPOF unless the cluster can automatically elect a new one when it dies. So you need two things — a way to pick a leader, and a way to pick a new one safely when the old one vanishes. That’s exactly what consensus algorithms like Raft and Paxos provide. We’ll use Raft, because it was explicitly designed to be understandable.
Raft at a usable altitude
Every node is in one of three roles: follower, candidate, or leader. The cluster moves through numbered terms — think of each term as one election cycle with a monotonically increasing number.
The mechanics:
- Heartbeats. The leader sends periodic heartbeats. As long as followers hear them, everyone stays calm.
- Election timeout. If a follower hears nothing for a (randomized) timeout, it assumes the leader is dead, increments the term, becomes a candidate, votes for itself, and asks everyone else for their vote.
- Majority wins. A candidate that collects votes from a majority of the cluster becomes leader. Each node votes for at most one candidate per term — so at most one candidate can win.
- Randomized timeouts make split votes rare: nodes time out at different moments, so usually one candidate gets a head start and wins before others even wake up.
That last point is the elegant part — the randomness is what keeps two candidates from constantly tying.
Why a majority? Split-brain and quorum
Requiring a majority (a quorum — more than half) is the whole trick that prevents split-brain. Suppose a 5-node cluster splits into a group of 3 and a group of 2:
- The group of 3 is a majority → it can elect a leader and keep operating.
- The group of 2 is not a majority → it can’t elect a leader, so it refuses to act rather than risk being a second brain.
Because both sides can’t simultaneously hold a majority, you can never end up with two leaders. The minority partition sacrifices availability to preserve correctness — a direct echo of the CAP tradeoff.
Log replication: the actual job
Electing a leader is the setup; the point is keeping a consistent replicated log across the cluster. Once elected, the leader is the sole entry point for writes:
- A client sends a write to the leader, which appends it to its log as uncommitted.
- The leader replicates the entry to followers.
- Once a majority have stored it, the leader marks it committed and applies it to its state machine — then tells followers to do the same.
- Only committed entries are acknowledged to the client.
Because an entry isn’t committed until a majority hold it, any future leader is guaranteed to already have every committed entry (it needed a majority’s votes, and that majority overlaps the one that stored the entry). That overlap — the same quorum-intersection idea from the fault-tolerance lesson — is what makes the log durable across leader changes.
So when you need “exactly one leader” in your own application, you typically ask one of those services. A common, lighter-weight cousin of that need — “exactly one worker holds this lock right now” — has its own set of techniques and pitfalls. That’s distributed locks, next.