Courses/Distributed Systems

Distributed Systems

CS3.401

Prof. Kishore Kothapalli•Monsoon 2025-26•4 credits

Distributed Systems (Monsoon 2025-26) covers the foundations of building correct, fault-tolerant, scalable systems out of many independent computers. The course spans CAP and impossibility results; logical and physical clocks; global snapshots; causal-order message delivery; distributed mutual exclusion; deadlock detection; consensus and Byzantine agreement; transaction commit protocols (2PC, 3PC); Raft for replicated state machines; the GHS distributed minimum spanning tree algorithm; and the Google File System as a flagship distributed storage case study. This revision hub distils the entire syllabus into chapter-wise notes, cheatsheets, high-yield topics and practice questions — designed to revise the whole course in an evening, not a week.

Syllabus

Unit 1 — Introduction, Challenges & CAP Theorem

1 chapters

What is a distributed system, why is it useful, and what makes it hard? Tanenbaum's definition; required features (concurrency, independent failures, no global clock, no shared memory); the unique challenges (unreliable communication, lack of global knowledge, no synchronisation); CAP theorem (Brewer) and the C-vs-A trade-off under unavoidable partitions; the Two-Generals problem.

Unit 2 — Models, Events & Logical Time

1 chapters

The model of a distributed program (processors, channels, events). Lamport's happened-before relation. Logical clocks: scalar (Lamport), vector, and matrix — their properties, comparison, and use-cases. The strong-consistency trap (scalar fails the converse). Physical-time synchronisation: Cristian's, Berkeley, Decentralised averaging, NTP.

Unit 3 — Global Snapshots

1 chapters

Recording a consistent snapshot of a distributed system without stopping it. The C1/C2 cut conditions. Chandy-Lamport (FIFO channels), Lai-Yang (non-FIFO, white/red colouring), Acharya-Badrinath (causal channels, 2N messages). Banking example showing inconsistent vs consistent cuts.

Unit 4 — Causal Order Message Delivery

1 chapters

When messages must be delivered in causal order (not just FIFO). Birman-Schiper-Stephenson (BSS) algorithm: two delivery conditions on vector clocks. Causal order vs FIFO vs total order. Use cases: replicated databases, group communication, distributed debugging.

Unit 5 — Distributed Mutual Exclusion

1 chapters

Five canonical distributed mutex algorithms: Lamport (3(N-1) msgs, FIFO), Ricart-Agrawala (2(N-1) msgs, no FIFO), Maekawa (3√N → 5√N quorum-based with V2 deadlock fix), Suzuki-Kasami (token broadcast, 0 or N msgs), Raymond (token tree, O(log N) msgs). Safety / liveness / fairness; message complexity, synchronisation delay, throughput.

Unit 6 — Distributed Deadlock Detection

1 chapters

Resource request models (Single, AND, OR, AND-OR, P-out-of-Q). Wait-for graphs and the cycle-vs-knot distinction. Three handling strategies (prevention, avoidance, detection) and why detection dominates in DS. Detection algorithms: Ho-Ramamoorthy (centralised, 1-phase / 2-phase), Chandy-Misra-Haas probe (AND), Mitchell-Merritt label-passing (single-resource), Chandy diffusion (OR).

Unit 7 — Consensus & Byzantine Agreement

1 chapters

Reaching agreement among nodes that may fail. Crash-failure consensus: $f+1$ rounds, $(f+1) \cdot n^2$ messages. Byzantine bounds: $n \ge 3f+1$, $\ge f+1$ rounds. FLP impossibility in asynchronous systems. Lamport-Shostak-Pease oral-messages algorithm OM(m). Phase King — polynomial alternative with $n \ge 4f+1$. Three variants: Byzantine Agreement, Consensus, Interactive Consistency.

Unit 8 — Distributed Transactions, 2PC & 3PC

1 chapters

ACID across multiple sites. Two-Phase Commit (2PC): Prepare + Decide phases; log records; recovery rules; the BLOCKING problem when coordinator crashes after all <ready>. Three-Phase Commit (3PC): adds a PRE-COMMIT phase to break blocking; requires no-partition assumption (rarely used in practice).

Unit 9 — Raft Consensus

1 chapters

Raft solves the replicated state machine problem under crash failures (not Byzantine). Decomposed into three sub-problems: Leader Election (randomised timeouts, terms), Log Replication (AppendEntries + majority commit), and Safety (election restriction — only most-up-to-date logs can be elected). Production-grade — backs etcd, Consul, CockroachDB.

Unit 10 — Distributed Minimum Spanning Tree (GHS Algorithm)

1 chapters

Gallager-Humblet-Spira (1983): the canonical distributed MST algorithm. Cycle and cut properties. Fragments (subtrees of MST) with (name, level). Combining rules A (absorption), B (merger), C (wait). Test/Accept/Reject mechanism on basic edges. Max level ≤ log N. Complexity O(N log N + E) messages — optimal among comparison-based.

Unit 11 — Google File System (GFS)

1 chapters

GFS architecture: single master + chunkservers + clients. 64 MB chunks replicated 3× across racks. Leases serialise mutations at a primary replica. Atomic record append. Four consistency states (defined / consistent / undefined / inconsistent). Pipelined data flow vs star control flow. Garbage collection via heartbeats + 3-day hidden retention. Copy-on-write snapshots.

Weightage

Unit 1 — Introduction, Challenges & CAP6%

Unit 2 — Models, Events & Logical Time12%

Unit 3 — Global Snapshots8%

Unit 4 — Causal Order Delivery4%

Unit 5 — Distributed Mutual Exclusion14%

Unit 6 — Distributed Deadlock Detection10%

Unit 7 — Consensus & Byzantine Agreement12%

Unit 8 — Distributed Transactions, 2PC & 3PC8%

Unit 9 — Raft Consensus8%

Unit 10 — Distributed MST (GHS)8%

Unit 11 — Google File System (GFS)10%

Exam pattern

Mid-Sem (~30%) + End-Sem (~40%) + Assignments / Quizzes (~30%). Confirm exact split with the instructor.

Important dates

Quiz 1 (tentative)2025-09-15
Mid-Sem Exam2025-10-10
Quiz 2 (tentative)2025-11-05
End-Sem Exam2025-12-01

Professor notes

Algorithm questions reward stating ASSUMPTIONS explicitly (FIFO? synchronous? failure model? N? f?) — usually a 1-mark sub-part. Always lead with assumptions.
Complexity bounds (message count, round count, sync delay) are high-yield 1–2 markers. Memorise the DME comparison table cold.
For Byzantine questions, the triple ($n \ge 3f+1$, $\ge f+1$ rounds, async impossible) is the standard 'state the bounds' sub-part.
Diagrams earn marks. A small WFG / message-flow / 2PC timeline takes 1 minute and is often worth 2 marks.
If you blank on an algorithm: write the problem statement + assumptions + message types + complexity. Each is partial credit.