ACID + 2PC + 3PC + Blocking & In-Doubt States
The Atomicity Problem Across Sites
A transaction transfers $100 from account A (at site 1) to account B (at site 2). Two writes; two sites. ACID demands atomicity — either both writes happen, or neither.
If you naively just send the two writes, one might succeed and the other fail. Now the system is inconsistent — $100 vanished or duplicated. Unacceptable.
A commit protocol solves this. Both sites must agree (vote) to commit before either actually commits. The coordinator orchestrates the vote and broadcasts the decision.
The textbook protocol is Two-Phase Commit (2PC). It works under fail-stop failures. Its weakness — blocking — is exactly why 3PC exists.
ACID Across The Sites
The acronym every database exam wants:
- Atomicity — all-or-nothing.
- Consistency — preserves DB invariants.
- Isolation — concurrent transactions don't see each other's partial work.
- Durability — committed effects survive failures.
For distributed transactions, atomicity is the hard one — it's what 2PC/3PC exist to enforce.
2PC — The Two Phases
Assumption: fail-stop. Failed sites stop sending; never send incorrect messages; may recover later. Each site has a stable log.
Phase 1 (Prepare / Voting):
- Coordinator writes to log + forces stable.
- Sends PREPARE T to every participant.
- Each participant: can I commit? If yes → write to log (forced stable) + send READY T. If no → write + send ABORT T.
Phase 2 (Decide):
- If coord received READY from all → write to stable log — POINT OF NO RETURN — send COMMIT T to all.
- Else → write + send ABORT T.
Each participant writes the decision locally + acts. Done.
Recovery — Reading The Log
A participant crashes mid-protocol, recovers, and examines its log:
- → redo(T).
- → undo(T).
- ONLY (no decision) → consult coordinator — this is the in-doubt state.
- Nothing → never voted; coord must have aborted; undo(T).
In-doubt is the dangerous state: the participant must HOLD ALL THE LOCKS T acquired until it learns the decision. Until then, no other transaction can touch those objects.
The Blocking Problem
The exam wants you to know exactly when 2PC blocks:
**All participants have but none have the decision, AND the coordinator has crashed.**
Why is this fatal? From a survivor's perspective, the coordinator might have:
- Crashed after writing — some unknown participants may have already committed → must commit.
- Crashed before writing — safe to abort.
Survivors cannot distinguish these without the coordinator. So they MUST wait. Holding all the locks.
This is the blocking problem — the single biggest weakness of 2PC.
3PC — Breaking The Block
Three-Phase Commit (Skeen 1981) adds an intermediate PRE-COMMIT phase. The idea: replicate the "intent to commit" at multiple sites so survivors can recover the decision among themselves.
Phase 1 (PREPARE) — same as 2PC.
Phase 2 (PRE-COMMIT) — coord decides commit/abort from votes. If commit, sends PRE-COMMIT to all and waits for at least acks before proceeding. Decision intent now replicated at sites.
Phase 3 (COMMIT/ABORT) — coord sends final COMMIT (or ABORT); participants execute.
If coordinator crashes after Phase 2:
- Survivors check among themselves. If anyone has → elect new coord → broadcast COMMIT. Non-blocking.
- If no one has pre-commit → decision can't have been about to commit → safe ABORT.
**3PC is non-blocking under failures.**
Why 3PC Isn't Used In Practice
Two reasons:
(i) Extra round trip. More messages, more forced log writes, more latency.
(ii) Assumes no network partitions. That's unrealistic. With a partition, sites on each side can independently decide — possibly conflictingly. The non-blocking guarantee evaporates.
Production systems get the best of both worlds: 2PC + a strong consensus protocol (Paxos or Raft) for the coordinator. The coordinator becomes itself a replicated state machine; coordinator crashes are handled by leader election; the protocol is 2PC underneath but the SPOF is gone.
What You Walk In Carrying
ACID across multiple sites. 2PC's two phases + log records + recovery rules. The blocking scenario in one line: all <ready> + coord crashed. Recovery decision table (<commit> redo, <abort> undo, <ready> ask, nothing abort). Network partition handling. 2PC disadvantages (blocking, overhead, SPOF). 3PC three phases + how PRE-COMMIT breaks blocking + the strong assumptions (no partition, failures). Why 3PC isn't production-default.