Saral Shiksha Yojna
Courses/Distributed Systems

Distributed Systems

CS3.401
Prof. Kishore KothapalliMonsoon 2025-264 credits

Memory Triggers

Tiny cues. They reconstruct big topics.

CAP three letters
Consistency, Availability, Partition tolerance. Pick 2. Partitions unavoidable → trade C vs A in practice.
Lamport's happened-before
(1) same-process order, (2) send → receive, (3) transitive. e || e' iff neither precedes.
Scalar clock rules R1/R2
R1: ci += d before any event. R2 (recv): ci = max(ci, tmsg), then R1, then deliver.
Vector clock rules
R1: V[i] += d on local event. R2 (recv): ∀k V[k] = max(V[k], Vm[k]), then R1.
Strong consistency property
e → e' ⇔ C(e) < C(e'). Scalar: ⇒ only. Vector: ⇔. Matrix: ⇔ + 2nd-order knowledge.
Matrix clock element meaning
mt[j,k] = i's knowledge of j's knowledge of k's clock. Killer use: discard obsolete messages when min_k mt[k,i] ≥ t.
Chandy-Lamport marker rules
Initiator: record state, send marker on all out-channels. Receive marker on C: if first → record state, C=∅, send markers; else → C = msgs between recording and this marker.
C1 / C2 cut conditions
C1: send recorded ⇒ msg in channel XOR received (no ghost receives). C2: send NOT recorded ⇒ msg NOT in channel AND NOT received.
Acharya-Badrinath channel state
Channel i→j contents = msgs numbered {RECDⱼ[i]+1, ..., SENTᵢ[j]}. Causal order guarantees they will arrive in that range.
BSS delivery conditions
Deliver m from Pj iff (a) Vm[j] = Vi[j]+1 AND (b) ∀k ≠ j Vm[k] ≤ Vi[k].
Lamport DME entry conditions
(L1) received a msg with ts > (tsi, i) from every other site, AND (L2) own request at top of request_queue.
Ricart-Agrawala defer rule
Defer REPLY if in CS, OR requesting with my ts < your ts. Else reply immediately.
Maekawa quorum properties
i ∈ Ri; Ri ∩ Rj ≠ ∅; |Ri|=K; each node in D quorums. Optimum K=D=√N.
Maekawa V2 three extra messages
FAILED (already replied higher), INQUIRE (higher arrived, ask 'in CS?'), YIELD (got FAILED, relinquish).
Suzuki-Kasami token contents
Q (FIFO queue of pending requestors) + LN[1..n] (seq num of last CS exec per process).
Suzuki-Kasami token send rule
Send token to Pi iff RN[i] = LN[i] + 1 (fresh, not stale duplicate).
Raymond Holder pointer
Logical k-ary tree; Holder points toward current token-holder root. Token migrates along Holder chain.
Deadlock model → detection criterion
Single/AND: cycle in WFG. OR: KNOT (not just cycle). AND-OR: repeated OR tests.
Chandy-Misra-Haas probe format
probe(i, j, k) = (initiator, sender, receiver). Pk forwards if blocked. Deadlock when initiator i = receiver k on a probe.
Mitchell-Merritt direction
Probes travel OPPOSITE to WFG edges. Detect when public label returns to self.
Byzantine bounds triple
n ≥ 3f+1 processes; ≥ f+1 rounds; impossible in async even with 1 crash (FLP).
Crash consensus complexity
f+1 rounds; (f+1)·n² messages; min over all received + own.
Phase King rules
f+1 phases × 2 rounds. Phase k: round 1 broadcast + count multiplicity; round 2 king broadcasts → adopt king UNLESS multiplicity > n/2 + f.
OM(m) recursion termination
OM(0): direct. OM(t): each lieutenant runs OM(t-1) as general; majority(W) at end. Needs n ≥ 3m+1.
ACID acronym
Atomicity (all-or-nothing), Consistency (DB invariants), Isolation (no partial views), Durability (committed survives failure).
2PC log records
<prepare T>, <ready T>, <no T>, <commit T>, <abort T>. <commit T> in stable log = point of no return.
2PC recovery decision
On crash recovery: <commit T> → redo. <abort T> → undo. <ready T> only → ask coordinator (in-doubt). Nothing → must abort (never sent ready).
2PC blocking case
All participants have <ready T> but no decision AND coordinator crashed → block.
3PC phases (in order)
PREPARE (same as 2PC) → PRE-COMMIT (collect K acks) → COMMIT/ABORT. Pre-commit replicates decision intent at K+1 sites.
Raft three sub-problems
Leader Election + Log Replication + Safety. Each decomposed independently.
Raft safety rule
Election restriction: only servers with most up-to-date logs can be elected. New leader can't directly commit prior-term entries.
GHS combining rules
A: L<L' → absorb into bigger. B: L=L' AND eF mutual → merge to L+1, name w(eF). C: else → wait.
GHS max level
log N. Level k fragment has ≥ 2^k nodes. N nodes ⇒ max log₂ N levels.
GFS architecture three components
Single master (metadata + log) + chunkservers (64 MB chunks, replicated 3× across racks) + clients (library). Data never through master.
GFS write flow (six steps)
Client→master for lease → primary picks offset → client pushes data pipelined to closest replica → client→primary 'do write' → primary serialises + forwards order → ack chain → success.
GFS consistency states
Defined (consistent + reflects mutation entirely) > Consistent (same bytes everywhere) > Undefined (consistent but mingled) > Inconsistent (different data).
GFS lease purpose
Master grants to one replica = primary. Primary serialises mutations during lease; secondaries follow primary's order. Decouples master from per-write activity.