Crash Consensus + Byzantine Agreement (OM(m), Phase King) + FLP
Intuition
Consensus = a group of nodes agree on a single value despite some being faulty. The difficulty depends on the failure model. Crash-failure (silent halts, no lies): solvable in rounds with processes. Byzantine (arbitrary lying, conflicting messages): requires and at least rounds — and is impossible in async systems (FLP). Two flagship Byzantine algorithms: OM(m) (exponential messages, recursive) and Phase King (polynomial, needs more processes).
Explanation
Why consensus matters. Used for: committing distributed transactions (2PC/3PC), leader election (Raft/Paxos), agreeing on the current value of a replicated variable, ordering of operations (state-machine replication), atomic broadcast.
Failure models studied (synchronous). Crash failure — node may crash but never lies; once crashed, stays silent. Byzantine failure — node behaves arbitrarily/maliciously: may send wrong values, conflicting values to different peers, stay silent, lie about messages received.
Three problem variants. Consensus — each process broadcasts its own initial value; agree on a single value; if all non-faulty have , decide . Interactive Consistency — agree on a *vector* of values ; if is non-faulty with value , the -th vector slot must equal . Byzantine Agreement — designated source broadcasts; all non-faulty agree on the source's value (if source non-faulty, agree on source's). BA solves all three.
Crash-failure synchronous consensus algorithm. processes, up to may crash. rounds, each: send to all (if not sent), wait for all msgs this round, set . After final round: decide .
Correctness of crash algorithm. Termination: rounds, fixed. Validity: only inputs ever circulated; if all start with same , that's the only value seen. Agreement: If decided smaller than , the chain of crash-faulty nodes hiding from would need length ; only can fail. Contradiction.
Crash-failure performance. rounds. Each round messages. Total: messages.
Byzantine impossibility — three results. (i) **No solution exists if ** where = max Byzantine faulty. Need ( non-faulty majority). (ii) No deterministic solution in asynchronous systems even with one crash failure — FLP (Fischer-Lynch-Paterson, 1985). (iii) Lower bound: at least rounds required.
**Why Byzantine fails.** Source + lieutenants , exactly one may be faulty. *Scenario 1*: sends to both; faulty lies, tells that said . Loyal sees 'S said 0, U said S said 1' — must decide (agree with S since loyal). *Scenario 2*: faulty, sends to and to . sees 'S said 0, U said S said 1' — same view as Scenario 1 — must decide . sees 'S said 1, T said S said 0' — same as Scenario 1's — must decide . and disagree though both loyal — contradicts agreement.
Lamport-Shostak-Pease Oral Messages — OM(m). Recursion with resilience parameter . Base case OM(0): general sends value to all lieutenants; each lieutenant decides on the value received (or 'undef' if no msg). **Recursive case OM(t), **: general sends value to all lieutenants. Each lieutenant , on receiving from the general, acts as the new 'general' running to broadcast to the other lieutenants. After rounds, every takes the majority of the values it computed from each of these sub-broadcasts as its decision.
OM assumptions. Every message is delivered correctly. Receiver always knows the sender's identity. Absence of a message can be detected (synchronous). Messages can be content-altered by faulty senders but NOT forged in transit.
OM tolerates up to faulty if . Needs rounds. Message complexity: — exponential. This is OM's main drawback.
Phase King algorithm. Byzantine consensus with shared values (each has own initial value). Requires (more processes than OM, but simpler protocol). ** phases × 2 rounds each. Polynomial message complexity.**
Phase King — each phase. Round 1: every process broadcasts its current value to all. Each process computes the majority of values received (or a tie-breaker default if no strict majority); also computes the multiplicity (count of the majority value). Round 2: the phase king ( for phase ) broadcasts its value to everyone. Each updates: if its majority's multiplicity → keep majority; else → adopt the king's value .
Phase King — correctness intuition. Run phases. At most processes are Byzantine, so at least one king is non-malicious. After that king's phase, all non-faulty processes agree on a common value; the agreement persists in subsequent phases.
**Why Phase King needs .** The decision rule requires multiplicity to override the king's value. For this rule to give correctness against Byzantine plus split honest votes, we need , i.e., .
OM(m) vs Phase King. Bounds: OM , PK (PK needs more processes). Rounds: OM , PK . Messages: OM exponential, PK polynomial. Simplicity: OM is complex recursion; PK is simple two-round phases. PK is the practical choice for large .
Definitions
- Crash failure — Process halts (crashes) and stops sending msgs; never lies. Once crashed, stays crashed.
- Byzantine failure — Process behaves arbitrarily — may lie, send conflicting msgs to different peers, stay silent, replay old msgs, etc.
- Consensus — Each process has own initial value; all non-faulty agree on a single value; if all start with , decide .
- Interactive consistency — Agree on a vector such that if is non-faulty with value , the -th slot is .
- Byzantine agreement (BA) — Designated source broadcasts; all non-faulty agree on the source's value (or on a default if source faulty). Solves consensus + interactive consistency.
- FLP impossibility (Fischer-Lynch-Paterson, 1985) — No deterministic algorithm solves consensus in an asynchronous system, even with just ONE crash failure. Motivates randomised / partially-synchronous algorithms.
- Lamport-Shostak-Pease OM(m) — Recursive oral-messages algorithm for Byzantine agreement. OM(0): direct. OM(t): each lieutenant becomes general for OM(t-1); majority at end. rounds; ; messages.
- Phase King — Polynomial Byzantine consensus algorithm. ; phases × 2 rounds; adopt majority if multiplicity else trust king. At least one honest king guarantees convergence.
Formulas
Derivations
Crash consensus correctness. Each round, every process sends its value to all and takes min. If process decides smaller than , then somewhere saw that didn't. Tracing: hasn't seen means every chain delivering to crashed before completing. A chain of length requires crashes — contradiction with bound.
**Why for BA.** Indistinguishability argument (Lamport-Shostak-Pease 1982): with faulty, a loyal node can see exactly the same view in two scenarios — one with a faulty source, one with a faulty peer — yet must decide differently. Only when honest majority is can the loyal nodes break ties via majority.
** rounds lower bound.** Inductive argument: with rounds, an adversary can prevent agreement by hiding information from a chain of processes in successive rounds. Need one more round to force the chain to terminate.
Phase King — at least one honest king ensures agreement. Run phases. At most kings are Byzantine. So honest king. In that king's phase, the king broadcasts a correct value; the multiplicity rule ensures no process adopts a conflicting majority unless that majority strictly exceeds — impossible against an honest king's broadcast. From that phase on, agreement is invariant.
Examples
- **Crash consensus for .** Inputs: 5, 3, 7, 2. Round 1: each sends to all; suppose crashes after sending some. After round 1, have min of received: 2 (or fewer if 's msgs partially delivered). Round 2: re-broadcast min; converge to 2. Decide 2. **Two rounds = for .**
- **OM(1) for — G faulty case.** sends to and to . Round 2 (each acts as general for the others): relays 'G said 0' to . relays 'G said 0'. relays 'G said 1'. sees → majority 0. sees → 0. sees → 0. All loyal agree on 0. ** rounds; minimum.**
- **OM(1) for — L3 faulty case.** loyal, sends 0 to all. faulty, relays 1 to others. sees → majority 0. sees → 0. Both loyal lieutenants agree on G's value 0. ✓
- **Phase King with .** , say . Phases: . Round 1 of phase 1: everyone broadcasts; tally majority + multiplicity. Round 2: (king of phase 1) broadcasts. Most adopt majority if multiplicity ; if not, adopt 's value. Round 1 of phase 2 + round 2 with as king. After phase 2 (at least one king honest), all agree.
Diagrams
- Crash consensus timeline: , two rounds; each round sends to all + takes min.
- Indistinguishability scenario: with faulty source vs faulty peer; loyal nodes see identical views but must decide differently → impossibility.
- OM(2) recursion tree for : General at root; each lieutenant becomes general for OM(1); final majority at leaves.
- Phase King flow: phase 1 round 1 (broadcast) + round 2 (king); phase 2 same; at least one honest king ⇒ agreement.
- OM vs Phase King comparison table: bounds, rounds, messages, simplicity.
Edge cases
- FLP impossibility in async — no deterministic Byzantine solution; production systems use randomised algorithms (Ben-Or, Honey Badger BFT) or partial synchrony assumptions.
- Synchronous assumption matters — drop synchrony and bounds change drastically.
- OM exponential blowup — messages; impractical for in large clusters.
- **Phase King's vs OM's ** — PK needs MORE processes to tolerate the same .
- Authenticated Byzantine (signed messages) reduces bound to — but assumes cryptographic signing infrastructure.
Common mistakes
- **Saying 'Byzantine needs '.** That's for authenticated (signed-message) Byzantine. Unauthenticated needs .
- Saying 'OM is polynomial'. No — OM has exponential messages. Phase King is the polynomial alternative.
- Confusing 'consensus' with 'Byzantine agreement'. Consensus: each has own input. BA: source broadcasts. Interactive Consistency: agree on a vector.
- Saying 'FLP shows distributed consensus is always impossible'. No — FLP shows DETERMINISTIC consensus in async systems is impossible. Randomised algorithms work; synchronous algorithms work.
- **Phase King needs .** No — Phase King needs . Polynomial cost vs OM, but more processes.
Shortcuts
- **Crash: rounds, msgs.**
- **Byzantine: , rounds, FLP in async.**
- **OM: recursive, exponential msgs, , rounds.**
- **Phase King: , rounds, polynomial msgs.**
- N=3, f=1 impossible. Faulty source flips T and U into indistinguishable views.
- Three variants: BA solves all.
Proofs / Algorithms
** Byzantine impossibility.** Source + lieutenants . Scenario A: loyal sends 0 to both; faulty tells that said 1. sees (S:0, U:1) — must decide 0 (agree with S). Scenario B: faulty, sends 0 to , 1 to . sees (S:0, U:1) — INDISTINGUISHABLE from A — must decide 0. sees (S:1, T:0) — must decide 1 (its view says S said 1). and disagree though both loyal. Contradicts agreement. Hence impossible.
** rounds lower bound (crash and Byzantine).** Adversary strategy: in each round, crash one node mid-broadcast so its value reaches a specific subset only. After rounds, nodes crashed; some loyal node has incomplete information. One more round needed to disseminate everywhere. Hence rounds.
Phase King termination (at least one honest king). phases; at most kings can be Byzantine. So honest king. In that king's phase, king broadcasts a value ; processes' majority rule with multiplicity threshold ensures no process adopts a contrary majority — they take unless they have a multiplicity-dominant alternative, which is impossible under . From that phase on, all loyal agree.