Saral Shiksha Yojna
Courses/Distributed Systems

Distributed Systems

CS3.401
Prof. Kishore KothapalliMonsoon 2025-264 credits

ACID + 2PC + 3PC + Blocking & In-Doubt States

NotesStory
Unit 8 — Distributed Transactions, 2PC & 3PC

The Atomicity Problem Across Sites

A transaction transfers $100 from account A (at site 1) to account B (at site 2). Two writes; two sites. ACID demands atomicity — either both writes happen, or neither.

If you naively just send the two writes, one might succeed and the other fail. Now the system is inconsistent — $100 vanished or duplicated. Unacceptable.

A commit protocol solves this. Both sites must agree (vote) to commit before either actually commits. The coordinator orchestrates the vote and broadcasts the decision.

The textbook protocol is Two-Phase Commit (2PC). It works under fail-stop failures. Its weakness — blocking — is exactly why 3PC exists.

ACID Across The Sites

The acronym every database exam wants:

  • Atomicity — all-or-nothing.
  • Consistency — preserves DB invariants.
  • Isolation — concurrent transactions don't see each other's partial work.
  • Durability — committed effects survive failures.

For distributed transactions, atomicity is the hard one — it's what 2PC/3PC exist to enforce.

2PC — The Two Phases

Assumption: fail-stop. Failed sites stop sending; never send incorrect messages; may recover later. Each site has a stable log.

Phase 1 (Prepare / Voting):

  • Coordinator writes to log + forces stable.
  • Sends PREPARE T to every participant.
  • Each participant: can I commit? If yes → write to log (forced stable) + send READY T. If no → write + send ABORT T.

Phase 2 (Decide):

  • If coord received READY from all → write to stable log — POINT OF NO RETURN — send COMMIT T to all.
  • Else → write + send ABORT T.

Each participant writes the decision locally + acts. Done.

Recovery — Reading The Log

A participant crashes mid-protocol, recovers, and examines its log:

  • redo(T).
  • undo(T).
  • ONLY (no decision) → consult coordinator — this is the in-doubt state.
  • Nothing → never voted; coord must have aborted; undo(T).

In-doubt is the dangerous state: the participant must HOLD ALL THE LOCKS T acquired until it learns the decision. Until then, no other transaction can touch those objects.

The Blocking Problem

The exam wants you to know exactly when 2PC blocks:

**All participants have but none have the decision, AND the coordinator has crashed.**

Why is this fatal? From a survivor's perspective, the coordinator might have:

  • Crashed after writing — some unknown participants may have already committed → must commit.
  • Crashed before writing — safe to abort.

Survivors cannot distinguish these without the coordinator. So they MUST wait. Holding all the locks.

This is the blocking problem — the single biggest weakness of 2PC.

3PC — Breaking The Block

Three-Phase Commit (Skeen 1981) adds an intermediate PRE-COMMIT phase. The idea: replicate the "intent to commit" at multiple sites so survivors can recover the decision among themselves.

Phase 1 (PREPARE) — same as 2PC.

Phase 2 (PRE-COMMIT) — coord decides commit/abort from votes. If commit, sends PRE-COMMIT to all and waits for at least acks before proceeding. Decision intent now replicated at sites.

Phase 3 (COMMIT/ABORT) — coord sends final COMMIT (or ABORT); participants execute.

If coordinator crashes after Phase 2:

  • Survivors check among themselves. If anyone has → elect new coord → broadcast COMMIT. Non-blocking.
  • If no one has pre-commit → decision can't have been about to commit → safe ABORT.

**3PC is non-blocking under failures.**

Why 3PC Isn't Used In Practice

Two reasons:

(i) Extra round trip. More messages, more forced log writes, more latency.

(ii) Assumes no network partitions. That's unrealistic. With a partition, sites on each side can independently decide — possibly conflictingly. The non-blocking guarantee evaporates.

Production systems get the best of both worlds: 2PC + a strong consensus protocol (Paxos or Raft) for the coordinator. The coordinator becomes itself a replicated state machine; coordinator crashes are handled by leader election; the protocol is 2PC underneath but the SPOF is gone.

What You Walk In Carrying

ACID across multiple sites. 2PC's two phases + log records + recovery rules. The blocking scenario in one line: all <ready> + coord crashed. Recovery decision table (<commit> redo, <abort> undo, <ready> ask, nothing abort). Network partition handling. 2PC disadvantages (blocking, overhead, SPOF). 3PC three phases + how PRE-COMMIT breaks blocking + the strong assumptions (no partition, failures). Why 3PC isn't production-default.