Revision Notes/Unit 7 — Consensus & Byzantine Agreement/Crash Consensus + Byzantine Agreement (OM(m), Phase King) + FLP/Story

Crash Consensus + Byzantine Agreement (OM(m), Phase King) + FLP

NotesStory

Unit 7 — Consensus & Byzantine Agreement

The Hard Problem

Consensus is the simplest-looking distributed problem and the hardest to actually solve: a group of nodes need to agree on a single value, despite some of them being faulty or malicious. The difficulty depends entirely on what kinds of faults you tolerate.

Three Variants

Consensus: every process has its own input value. All non-faulty processes must agree on the same single value. If all non-faulty start with $v$ , they must decide $v$ .

Interactive Consistency: all non-faulty processes agree on a *vector* $(v_{1}, \dots, v_{n})$ . If $P_{k}$ is non-faulty with value $v_{k}$ , the vector's $k$ -th slot must equal $v_{k}$ .

Byzantine Agreement (BA): a designated source broadcasts. All non-faulty processes agree on the source's value (if source non-faulty, on the source's actual value).

BA solves all three by suitable invocation. So most theoretical work focuses on BA.

Two Failure Models

Crash failure: a process may halt and stop sending messages. Never lies. Once crashed, silent forever.

Byzantine failure: a process behaves arbitrarily — sends wrong values, conflicting values to different peers, stays silent, lies about what it received.

The crash case is *much* easier.

Crash-Failure Consensus

For synchronous systems with $n$ processes and at most $f$ crashes ( $f < n$ ):

`` v := my_input for round = 1 to f+1: send v to all (if not already sent this round) wait for round's messages v := min(v, all_received_values + v) decide v ``

Termination: $f + 1$ rounds, fixed.

Validity: only inputs ever circulated; if all start with same $v$ , decide $v$ .

Agreement: if $P_{j}$ decided smaller $v$ than $P_{i}$ , the chain of crash-faulty nodes hiding $v$ from $P_{i}$ would need length $f + 1$ — but only $f$ can fail. Contradiction.

Messages: $(f + 1) \cdot n^{2}$ (each round, every non-crashed sends to all). Polynomial.

Byzantine — The Hard Case

Now nodes can lie. Three impossibility results frame everything:

1. ** $n \geq 3 f + 1$ is necessary** ( $> 2/3$ honest majority). 2. **At least $f + 1$ rounds required. 3. No deterministic solution exists in asynchronous systems**, even with ONE crash failure (Fischer-Lynch-Paterson 1985 — FLP).

Why $N = 3, f = 1$ Is Impossible

Three players: source $S$ + lieutenants $T, U$ . Exactly one may be faulty.

Scenario A: $S$ is loyal, sends 0 to both. $U$ is faulty and tells $T$ that $S$ said 1. $T$ 's view: "S said 0, U said S said 1". Since $T$ is loyal and $S$ is loyal in this scenario, $T$ must decide 0.

Scenario B: $S$ is faulty, sends 0 to $T$ and 1 to $U$ . $T$ 's view: "S said 0, U said S said 1" — IDENTICAL to Scenario A's view. So $T$ must decide 0 (same algorithm, same input). But $U$ 's view in B: "S said 1, T said S said 0" — must decide 1.

Both $T$ and $U$ are loyal in Scenario B but they disagree. Contradicts agreement. Hence $N = 3, f = 1$ is impossible.

Generalising: $N \geq 3 f + 1$ is necessary.

Lamport-Shostak-Pease OM(m) Algorithm

The classical Byzantine agreement algorithm. Recursive with resilience parameter $t = m$ .

Base case OM(0): general sends value $x_{g}$ to all lieutenants. Each lieutenant decides on the value received (or 'undef' if no msg).

**Recursive case OM(t), $t > 0$ **: general sends value to all lieutenants. Each lieutenant $L_{i}$ , on receiving $v$ from the general, acts as the new 'general' running OM(t-1) to broadcast $v$ to the other $N - 2$ lieutenants. After $t + 1$ rounds, every $L_{i}$ computes the majority of the values from each of these sub-broadcasts as its decision.

**Tolerates $m$ faulty if $N \geq 3 m + 1$ . Needs $m + 1$ rounds.**

Cost: $O (N^{m + 1})$ messages — exponential. This is OM's main drawback.

Assumptions: every message delivered correctly; receiver knows sender; absence detectable (synchronous); content can be altered by faulty senders but not forged in transit.

OM(1) For N = 4, m = 1 — Faulty Source Case

$G$ faulty, sends $0$ to $L_{1}, L_{2}$ and $1$ to $L_{3}$ .

Round 2: each $L_{i}$ relays what it received from $G$ .

$L_{1}$ sees ${G : 0, L_{2} : 0, L_{3} : 1}$ → majority 0.
$L_{2}$ sees ${G : 0, L_{1} : 0, L_{3} : 1}$ → 0.
$L_{3}$ sees ${G : 1, L_{1} : 0, L_{2} : 0}$ → 0.

All loyal lieutenants agree on 0. ✓ ( $m + 1 = 2$ rounds; $N = 3 m + 1 = 4$ minimum.)

Phase King — Polynomial Alternative

OM's exponential messages are impractical for large $f$ . Phase King trades MORE PROCESSES for FEWER MESSAGES.

Bounds: $N \geq 4 f + 1$ processes (more than OM's $3 f + 1$ ), $f + 1$ phases × 2 rounds each (so $2 (f + 1)$ total rounds), polynomial $O (N^{2} \cdot f)$ messages.

Each phase:

Round 1: every process broadcasts its current value to all. Compute the majority of values received + the multiplicity (count of majority value).
Round 2: the phase king ( $P_{k}$ for phase $k$ ) broadcasts its value $v_{k}$ to everyone. Each $P_{i}$ updates: if its majority's multiplicity $> N /2 + f$ → keep majority; else → adopt king's $v_{k}$ .

Correctness intuition: $f + 1$ phases, at most $f$ kings Byzantine ⇒ at least one honest king. After that king's phase, all loyal processes agree, and the agreement persists.

**Why $N \geq 4 f + 1$ ?** The decision rule requires multiplicity $> N /2 + f$ to override the king. Against $f$ Byzantine plus split honest votes, we need $N > 4 f$ .

OM(m) vs Phase King — Choose Wisely

| | OM(m) | Phase King | |---|---|---| | Process bound | $n \geq 3 f + 1$ | $n \geq 4 f + 1$ | | Rounds | $f + 1$ | $2 (f + 1)$ | | Messages | $O (N^{f + 1})$ exponential | $O (N^{2} f)$ polynomial | | Simplicity | Complex recursion | Simple two-round phases |

For small $f$ (1, 2), OM is fine. For large $f$ , Phase King dominates.

What FLP Says (And Doesn't)

FLP (1985): no deterministic algorithm solves consensus in an asynchronous system, even with just ONE crash failure.

What FLP DOESN'T say: that consensus is impossible. Production systems work around FLP by:

Assuming partial synchrony (eventually messages arrive in bounded time) → Paxos, Raft.
Using randomisation (e.g., Ben-Or, Honey Badger BFT).
Accepting eventual consistency instead of strong agreement.

Bitcoin sidesteps FLP via probabilistic finality (proof-of-work).

What You Walk In Carrying

Three variants (Consensus, IC, BA) + BA solves all. Crash failure ( $f + 1$ rounds, $(f + 1) n^{2}$ msgs). Byzantine bounds: $n \geq 3 f + 1$ , $\geq f + 1$ rounds, FLP for async. Why $N = 3, f = 1$ Byzantine fails (indistinguishability). OM(m) recursive algorithm + $m + 1$ rounds + exponential messages. Phase King with $N \geq 4 f + 1$ + 2(f+1) rounds + polynomial. OM vs Phase King table. FLP statement and what it doesn't preclude.

Distributed Systems