Crash Consensus + Byzantine Agreement (OM(m), Phase King) + FLP
The Hard Problem
Consensus is the simplest-looking distributed problem and the hardest to actually solve: a group of nodes need to agree on a single value, despite some of them being faulty or malicious. The difficulty depends entirely on what kinds of faults you tolerate.
Three Variants
Consensus: every process has its own input value. All non-faulty processes must agree on the same single value. If all non-faulty start with , they must decide .
Interactive Consistency: all non-faulty processes agree on a *vector* . If is non-faulty with value , the vector's -th slot must equal .
Byzantine Agreement (BA): a designated source broadcasts. All non-faulty processes agree on the source's value (if source non-faulty, on the source's actual value).
BA solves all three by suitable invocation. So most theoretical work focuses on BA.
Two Failure Models
Crash failure: a process may halt and stop sending messages. Never lies. Once crashed, silent forever.
Byzantine failure: a process behaves arbitrarily — sends wrong values, conflicting values to different peers, stays silent, lies about what it received.
The crash case is *much* easier.
Crash-Failure Consensus
For synchronous systems with processes and at most crashes ():
`` v := my_input for round = 1 to f+1: send v to all (if not already sent this round) wait for round's messages v := min(v, all_received_values + v) decide v ``
Termination: rounds, fixed.
Validity: only inputs ever circulated; if all start with same , decide .
Agreement: if decided smaller than , the chain of crash-faulty nodes hiding from would need length — but only can fail. Contradiction.
Messages: (each round, every non-crashed sends to all). Polynomial.
Byzantine — The Hard Case
Now nodes can lie. Three impossibility results frame everything:
1. ** is necessary** ( honest majority). 2. **At least rounds required. 3. No deterministic solution exists in asynchronous systems**, even with ONE crash failure (Fischer-Lynch-Paterson 1985 — FLP).
Why $N = 3, f = 1$ Is Impossible
Three players: source + lieutenants . Exactly one may be faulty.
Scenario A: is loyal, sends 0 to both. is faulty and tells that said 1. 's view: "S said 0, U said S said 1". Since is loyal and is loyal in this scenario, must decide 0.
Scenario B: is faulty, sends 0 to and 1 to . 's view: "S said 0, U said S said 1" — IDENTICAL to Scenario A's view. So must decide 0 (same algorithm, same input). But 's view in B: "S said 1, T said S said 0" — must decide 1.
Both and are loyal in Scenario B but they disagree. Contradicts agreement. Hence is impossible.
Generalising: is necessary.
Lamport-Shostak-Pease OM(m) Algorithm
The classical Byzantine agreement algorithm. Recursive with resilience parameter .
Base case OM(0): general sends value to all lieutenants. Each lieutenant decides on the value received (or 'undef' if no msg).
**Recursive case OM(t), **: general sends value to all lieutenants. Each lieutenant , on receiving from the general, acts as the new 'general' running OM(t-1) to broadcast to the other lieutenants. After rounds, every computes the majority of the values from each of these sub-broadcasts as its decision.
**Tolerates faulty if . Needs rounds.**
Cost: messages — exponential. This is OM's main drawback.
Assumptions: every message delivered correctly; receiver knows sender; absence detectable (synchronous); content can be altered by faulty senders but not forged in transit.
OM(1) For N = 4, m = 1 — Faulty Source Case
faulty, sends to and to .
Round 2: each relays what it received from .
- sees → majority 0.
- sees → 0.
- sees → 0.
All loyal lieutenants agree on 0. ✓ ( rounds; minimum.)
Phase King — Polynomial Alternative
OM's exponential messages are impractical for large . Phase King trades MORE PROCESSES for FEWER MESSAGES.
Bounds: processes (more than OM's ), phases × 2 rounds each (so total rounds), polynomial messages.
Each phase:
- Round 1: every process broadcasts its current value to all. Compute the majority of values received + the multiplicity (count of majority value).
- Round 2: the phase king ( for phase ) broadcasts its value to everyone. Each updates: if its majority's multiplicity → keep majority; else → adopt king's .
Correctness intuition: phases, at most kings Byzantine ⇒ at least one honest king. After that king's phase, all loyal processes agree, and the agreement persists.
**Why ?** The decision rule requires multiplicity to override the king. Against Byzantine plus split honest votes, we need .
OM(m) vs Phase King — Choose Wisely
| | OM(m) | Phase King | |---|---|---| | Process bound | | | | Rounds | | | | Messages | exponential | polynomial | | Simplicity | Complex recursion | Simple two-round phases |
For small (1, 2), OM is fine. For large , Phase King dominates.
What FLP Says (And Doesn't)
FLP (1985): no deterministic algorithm solves consensus in an asynchronous system, even with just ONE crash failure.
What FLP DOESN'T say: that consensus is impossible. Production systems work around FLP by:
- Assuming partial synchrony (eventually messages arrive in bounded time) → Paxos, Raft.
- Using randomisation (e.g., Ben-Or, Honey Badger BFT).
- Accepting eventual consistency instead of strong agreement.
Bitcoin sidesteps FLP via probabilistic finality (proof-of-work).
What You Walk In Carrying
Three variants (Consensus, IC, BA) + BA solves all. Crash failure ( rounds, msgs). Byzantine bounds: , rounds, FLP for async. Why Byzantine fails (indistinguishability). OM(m) recursive algorithm + rounds + exponential messages. Phase King with + 2(f+1) rounds + polynomial. OM vs Phase King table. FLP statement and what it doesn't preclude.