Revision Notes/Unit 8 — Distributed Transactions, 2PC & 3PC/ACID + 2PC + 3PC + Blocking & In-Doubt States

ACID + 2PC + 3PC + Blocking & In-Doubt States

Intuition

A distributed transaction touches data at multiple sites. ACID requires that it either commits at ALL sites or aborts at ALL sites — never inconsistent partial outcomes. 2PC is the textbook protocol: a coordinator gathers votes, then broadcasts the decision. Its weakness — and the entire reason 3PC exists — is the blocking problem: if the coordinator crashes after participants have voted but before announcing the decision, participants hold all their locks and wait indefinitely.

Explanation

Distributed transaction. A transaction that invokes operations at several servers. Each server's data manager is a participant; the coordinator is the server where the client opened the transaction. Other servers join the transaction by message.

ACID properties. Atomicity — all-or-nothing. Consistency — never violates DB integrity constraints; moves from one consistent state to another. Isolation — concurrent transactions don't see each other's partial results. Durability — committed effects survive failures.

Failure modes unique to distributed transactions. Site failure (a participant crashes). Loss of messages (handled by TCP/IP). Communication link failure (handled by routing). Network partition — system splits into disconnected subsystems; indistinguishable from site failure to either side.

Why a commit protocol? To enforce atomicity across sites: a transaction must commit at ALL sites or abort at ALL sites. Not acceptable to have it committed at one site and aborted at another.

Roles in 2PC. Transaction Manager at each site — maintains log for recovery; coordinates local concurrent execution. Transaction Coordinator at originating site — starts execution, distributes sub-transactions, drives commit/abort decision.

2PC assumption. Fail-stop model — failed sites stop participating; never send incorrect messages; may recover later. Each site has a stable log (writes survive crashes).

2PC Phase 1 — Voting / Prepare. Coordinator $C_{i}$ writes $⟨ prepare T ⟩$ to its log + forces to stable storage. $C_{i}$ sends $PREPARE T$ to every participant. Each participant decides locally: if it can commit, write $⟨ ready T ⟩$ to log + force to stable + send $READY T$ reply. If cannot, write $⟨ no T ⟩$ + send $ABORT T$ .

2PC Phase 2 — Decision / Commit. If $C_{i}$ received READY T from ALL participants → write $⟨ commit T ⟩$ to stable log + send $COMMIT T$ to all. Else → write $⟨ abort T ⟩$ + send $ABORT T$ . Each participant writes the decision locally + acts. **The $⟨ commit T ⟩$ at coordinator is the POINT OF NO RETURN** — once on stable storage, decision is irrevocable.

**2PC site recovery (participant $S_{k}$ recovers, examines log).** $⟨ commit T ⟩$ → redo(T). $⟨ abort T ⟩$ → undo(T). $⟨ ready T ⟩$ ONLY (no decision) → consult coordinator (this is the 'in-doubt' state — must hold locks). No control record → failed before reply; coordinator must have aborted T; execute undo(T).

2PC coordinator failure cases (survivors try to decide). Some active site has $⟨ commit T ⟩$ → commit. Some active site has $⟨ abort T ⟩$ → abort. Some active site has NO $⟨ ready T ⟩$ (never voted) → coordinator cannot have decided commit → abort. **All active sites have only $⟨ ready T ⟩$ , none committed → BLOCK (wait for coordinator). This is the blocking problem** of 2PC.

2PC network partition handling. If coordinator and all participants in one partition → no effect. If coordinator separated from some participants → cut-off sites treat it as coordinator failure (may block); coordinator treats absent sites as failed and runs as usual. No incorrect outcome, but some sites may block waiting.

Concurrency control during recovery. For each in-doubt $T$ (has $⟨ ready T ⟩$ but no decision), recovering site must reacquire all locks $T$ held. Log record written as $⟨ ready T, L ⟩$ where $L$ = list of locks held. Read locks may be omitted. After reacquisition, recovery runs concurrently with new transactions.

2PC disadvantages. Blocking — coordinator failure after participants vote READY but before broadcasting decision = participants must wait (holding locks!). Performance overhead from forced log writes. Coordinator = single point of failure.

3PC assumptions (strong assumptions = limited practical use). No network partitions. At any time, at least one site is up. At most $K$ sites may fail ( $K < N$ ).

3PC three phases. Phase 1 (PREPARE) — same as 2PC Phase 1. Phase 2 (PRE-COMMIT) — coordinator decides commit/abort from votes. If commit, sends PRE-COMMIT to all and waits for at least $K$ acknowledgements before proceeding. Pre-commit decision thus **replicated at $K + 1$ sites. Phase 3 (COMMIT/ABORT)** — coordinator sends COMMIT (or ABORT); participants execute.

How 3PC avoids blocking. Pre-commit decision known at $K + 1$ sites. If coordinator fails, surviving participants can recover the decision among themselves: if any active site has $⟨ pre-commit T ⟩$ , the new coordinator behaves as if it received $⟨ ready T ⟩$ from everyone, re-sends $⟨ pre-commit T ⟩$ , and commits. If no site has pre-commit, abort safely. **Non-blocking as long as $\leq K$ sites fail.**

3PC drawbacks (why not used in practice). Extra round trip → higher message + log overhead. Assumption of no network partition is unrealistic. Most production systems use 2PC + a strong consensus protocol (Paxos / Raft) for the coordinator → 2PC + leader election sidesteps 3PC's needs without its assumptions.

Definitions

Distributed transaction — Transaction touching data at multiple servers. Coordinator at originating site; participants at others.
ACID — Atomicity (all-or-nothing), Consistency (preserves invariants), Isolation (no partial views), Durability (committed survives failure).
Fail-stop model — Failed sites stop participating; never send incorrect messages; may recover. Used by 2PC and Raft.
<prepare T> / <ready T> / <commit T> / <abort T> — 2PC log records. <prepare>: coord initiated. <ready>: participant voted yes (forced stable). <commit>/<abort>: final decision (forced stable; point of no return).
In-doubt state (2PC) — Participant has <ready T> but no decision record; must hold all T's locks until decision known. Blocks indefinitely if coord unreachable.
Blocking problem (2PC) — All participants in <ready> state AND coordinator crashed = none can decide unilaterally → all block holding locks.
<pre-commit T> (3PC) — Replicated 'intent to commit' record. Sent by coord after collecting READYs; persisted at $K + 1$ sites before final COMMIT. Enables non-blocking recovery.
3PC assumptions — No network partitions + at least 1 site up + at most $K$ failures. Strong → why 3PC is not used in practice.

Formulas

$ACID = Atomicity + Consistency + Isolation + Durability$
$2PC point-of-no-return: ⟨ commit T ⟩ written to coord’s stable log$
$2PC blocking: \forall active site log = {⟨ ready T ⟩} \land coord crashed \Rightarrow block$
$3PC recovery rule: \exists active site with ⟨ pre-commit T ⟩ \Rightarrow commit; else abort$

Derivations

Why 2PC blocks (no escape). Suppose all participants have $⟨ ready T ⟩$ and the coordinator crashed before sending decision. From participants' perspective, the coordinator might have: (a) crashed after writing $⟨ commit T ⟩$ — some participants may have already committed. (b) crashed before writing — safe to abort. Participants cannot distinguish (a) and (b) without contacting the coordinator. Safety requires they wait — block.

Why 3PC avoids blocking. Phase 2 (PRE-COMMIT) replicates the 'intent to commit' at $K + 1$ sites. Surviving sites can check among themselves: if anyone has $⟨ pre-commit T ⟩$ , the decision was 'commit' (the coordinator had received all READYs and intended to commit). New coordinator commits on behalf of crashed one. If no one has pre-commit, the decision can't have been broadcast → safe to abort.

Examples

2PC happy path ( $n$ participants). Coord writes $⟨ prepare ⟩$ , sends PREPARE. Each participant writes $⟨ ready ⟩$ , replies READY. Coord receives all READYs, writes $⟨ commit ⟩$ — point of no return. Sends COMMIT. Each participant commits locally + sends ACK.
2PC blocking scenario. Coord sends PREPARE, gets all READYs back. Coord crashes BEFORE writing $⟨ commit ⟩$ . Participants have only $⟨ ready ⟩$ . Termination protocol: each asks others. All have $⟨ ready ⟩$ only — can't decide. Block waiting for coord recovery.
3PC same scenario. Coord sent PRE-COMMIT to all and got $K$ ACKs; then crashed. Survivors check: at least one has $⟨ pre-commit ⟩$ → new coord broadcasts COMMIT. No blocking.
Recovery after participant crash. Participant restarts, reads log. Sees $⟨ ready T ⟩$ only → in-doubt → asks coord. Coord says 'committed' → redo. Locks held by $T$ are reacquired during recovery from $⟨ ready T, L ⟩$ record.

Diagrams

2PC timeline: parallel lines for Coordinator and Participants; arrows for PREPARE, READY/NO, COMMIT/ABORT; shaded 'blocking region' if coord crashes mid-protocol.
3PC timeline: same as 2PC plus PRE-COMMIT phase between; show that decision intent is replicated.
Log-record state machine: <prepare> → <ready> | <no> → <commit> | <abort>. Annotate which records require forced stable-storage write.
Recovery decision flowchart: examine log → <commit> redo / <abort> undo / <ready> only ask coord / nothing abort.

Edge cases

Network partition in 2PC — cut-off sites block; coord may proceed with its half. No INCORRECT outcome, but some sites stuck.
Multiple coordinator failures — 2PC participants can never decide on their own; 3PC participants can with $K$ failures bound.
Coordinator equals participant — if the coord crashes and is also a participant, its log is essential for recovery. Replicate the log if possible.
Long-running in-doubt — participants hold locks ⇒ blocks new transactions on those objects. Hot data becomes unavailable.
3PC with partition — violates assumption; can give incorrect outcome (sites on both sides commit/abort differently).

Common mistakes

Saying '2PC handles network partitions'. No — 2PC cannot recover from partitions; participants may block.
Saying '3PC is widely used'. No — 3PC's no-partition assumption is unrealistic; production systems use 2PC + Raft/Paxos for the coordinator.
**Confusing $⟨ ready T ⟩$ with $⟨ commit T ⟩$ .** Ready = 'I voted YES' (held by participant). Commit = 'final decision' (held by coordinator first, then propagated).
Forgetting forced stable-storage writes. Both $⟨ ready T ⟩$ and $⟨ commit T ⟩$ must be FORCED to stable before sending the corresponding msg, else recovery loses information.
Saying '3PC has 3 phases of 2PC's same kind'. No — 3PC's phases are PREPARE + PRE-COMMIT + COMMIT/ABORT, not three voting rounds.

Shortcuts

ACID: Atomicity / Consistency / Isolation / Durability.
2PC = PREPARE + DECIDE.
2PC blocks when: all <ready> AND coord crashed.
Log on recovery: <commit> redo / <abort> undo / <ready> only ask / nothing abort.
3PC = PREPARE + PRE-COMMIT (K acks) + COMMIT.
3PC drawbacks: assumes no partition; not used in practice.

Proofs / Algorithms

2PC's blocking is unavoidable in 2-phase design. Decision is generated at one site (coordinator) and propagated. Participants in <ready> only see 'I voted yes; coord hasn't told me the outcome.' If coord crashed, the survivors lack the information needed to safely decide. No protocol with only 2 message phases + log records of READY/COMMIT/ABORT can resolve this — formally proved by the existence of decision-revealing executions vs decision-hiding executions.

3PC tolerates ≤ K failures non-blockingly. Pre-commit decision is at K+1 sites; at most K can fail; at least one survivor has the pre-commit record. Survivors elect a new coord; new coord polls others; if any has pre-commit ⇒ all eventually commit. If no one has pre-commit ⇒ decision was never about to commit ⇒ safe to abort.

End of chapterUnit 8 — Distributed Transactions, 2PC & 3PC · ACID + 2PC + 3PC + Blocking & In-Doubt States

View definitions for this chapter →·Cheatsheet·Practice questions

Distributed Systems