Saral Shiksha Yojna
Courses/Distributed Systems

Distributed Systems

CS3.401
Prof. Kishore KothapalliMonsoon 2025-264 credits
Revision Notes/Unit 3 — Global Snapshots/Chandy-Lamport, Lai-Yang, Acharya-Badrinath + Consistent Cuts

Chandy-Lamport, Lai-Yang, Acharya-Badrinath + Consistent Cuts

NotesStory

Intuition

We can't freeze time across a distributed system to take a snapshot. Instead, each process records its own local state independently, and the algorithm records the in-transit messages on each channel such that the combined picture is one the system *could have been in* in an equivalent execution — even if it wasn't actually in that exact state at any single instant. The channel-type assumption (FIFO / non-FIFO / causal) determines which algorithm applies.

Explanation

Why is recording a snapshot hard? No global clock to coordinate; can't stop the system; messages are in transit on channels. Naive 'everyone record now' fails because clocks differ.

Cut. A *cut* is a set of local states, one per process. A cut is consistent iff for every message recorded as received in the cut, is also recorded in the cut. (No 'ghost' receives — no effect without its cause.) Equivalently: no message arrow goes future → past.

C1 and C2 conditions (formal definition of consistent global state). C1: send recorded ⇒ msg in channel state XOR msg recorded as received (conservation — no message lost). C2: send NOT recorded ⇒ msg NOT in channel state AND NOT recorded as received (cause-before-effect — no premature receives).

Banking trap example. Suppose three banks. P1 records its state t_0t_2400 from P1. P1 also has a C_{12}1501001000 + (\text{400}) + 150 + 21502500400 transfer's receive without its send.

Chandy-Lamport algorithm — FIFO channels. Assumes FIFO. Uses a special marker message that separates pre- from post-snapshot messages. Initiator: record own state; send MARKER on every outgoing channel before any other message. **On receiving MARKER on channel from : (a) First marker received** by this process → record own state; record channel state as ; send MARKER on all outgoing channels; start recording on all *other* incoming channels. (b) Already recorded → stop recording ; channel state = messages received on between when state was recorded and this marker arrived.

Why FIFO? Messages sent BEFORE the marker on a channel must arrive before the marker (and be counted as channel state); messages sent AFTER the marker must NOT be counted. FIFO guarantees that anything arriving after the marker on is logically post-snapshot. Without FIFO, a post-snapshot message could overtake the marker and be wrongly included.

Chandy-Lamport termination. Algorithm terminates when each process has received a MARKER on *every* incoming channel. At that point every process has a recorded local snapshot and channel states for all its inbound channels.

Chandy-Lamport complexity. Messages: — one marker per channel direction. Time: where = diameter of the network.

Lai-Yang algorithm — non-FIFO channels. Uses message colouring instead of markers. Every process initially white; turns red when it records its state. Rule: every white process records its snapshot at its convenience but no later than receiving the first red message. Every white process records the history of all white messages sent or received. Channel state . Computed from histories.

Lai-Yang trade-off. Works on non-FIFO channels — useful when underlying network is unordered. Cost: heavy storage — each process keeps a complete history of white message traffic until snapshot completes.

Acharya-Badrinath algorithm — causal channels. Assumes causal delivery (stronger than FIFO). Each maintains = count of msgs sent to each , and = count of msgs received from each . Protocol: initiator broadcasts a token (including to self). Each on receiving the token records local snapshot + sends to initiator. Initiator computes channel state from to as messages indexed . Complexity: messages (token + reply per process).

Algorithm comparison. Chandy-Lamport — FIFO required, msgs, no extra storage. Lai-Yang — non-FIFO OK, msgs, heavy history storage. Acharya-Badrinath — causal required, msgs (light), small SENT/RECD counters.

Correctness intuition. The recorded snapshot is a *consistent global state* — one the system *could* have been in if some events were reordered (the events themselves are unchanged; only their position relative to the snapshot moves). The snapshot may not be a state the system was actually in at any wall-clock instant, but it is indistinguishable from one.

Definitions

  • Global snapshotA recorded state of all processes plus the messages in flight on all channels, captured without stopping the system.
  • Consistent cutA cut where every recorded receive has its corresponding send also recorded. Equivalently, no message arrow goes future → past.
  • C1 conditionSend recorded ⇒ msg is either in channel state OR recorded as received (not both, not neither). Conservation of messages.
  • C2 conditionSend NOT recorded ⇒ msg NOT in channel state AND NOT recorded as received. Cause-before-effect.
  • Marker (Chandy-Lamport)Special control message that separates pre-snapshot from post-snapshot messages on a FIFO channel. First marker on a channel triggers recording; subsequent markers stop it.
  • White/Red colouring (Lai-Yang)Process colour state: white = pre-snapshot, red = post-snapshot. A red message forces the receiver to finalise its own snapshot. Used when channels are non-FIFO.
  • SENT / RECD arrays (Acharya-Badrinath)Per-process counters of messages sent to / received from each other process. Channel state derived as messages numbered between sent and received counts.

Formulas

Derivations

Why Chandy-Lamport produces a consistent cut. Consider any message in the recorded state. If is recorded in the channel state, then by the marker rules arrived after 's state record but before its outgoing marker — so was before 's state record (in 's snapshot). If is recorded as 'received' in 's state, then by FIFO, preceded 's marker on that channel — so was before 's state record. In both cases the send is in the snapshot ⇒ C1 holds. Similar argument gives C2.

Banking inconsistent cut. records at : . at : (already received the from ) . at : . Suppose at , had sent a to in flight, recorded in . And has in flight. Total = (true total). The transfer's send is BEFORE 's snapshot but its receive is AFTER 's snapshot — sent in past, received in future — inconsistent cut (a 'ghost receive' but inverse: an 'invisible send').

Examples

  • Chandy-Lamport 3-process banking trace. with FIFO channels. records first, sends markers on . receives marker on first: records its state, , sends markers on . Suppose had already sent $ on before this. receives marker on first: records, . Then receives the $ on (still before 's marker on , by FIFO) — records as channel state. When marker on arrives, stops recording. Result includes the $ → total preserved.
  • Acharya-Badrinath channel-state computation. Initiator collects: has ; has . Channel state = msgs indexed — one message in transit. Channel state = — wait, and , so (empty). Reading order matters.
  • Lai-Yang colouring. goes red; sends a colored message to . , seeing a red message arrive, must finalize its own snapshot before processing — its own state must already be recorded (or recorded now). Histories of white messages give the channel states.

Diagrams

  • Chandy-Lamport marker flow on 3 processes: initiates, sends markers on outgoing channels; each receiver applies first-vs-later marker rules; channels record in-flight messages.
  • Consistent vs inconsistent cut: two timelines for and with arrows for messages; consistent cut has all arrows going past → future across the cut; inconsistent has at least one going future → past.
  • Lai-Yang colour transitions: events as white/red dots; red messages crossing the cut force receiver's snapshot.
  • Algorithm comparison table: rows = Chandy-Lamport / Lai-Yang / Acharya-Badrinath; cols = channel assumption, messages, extra storage, complexity.

Edge cases

  • Chandy-Lamport without FIFO breaks: a post-snapshot message can overtake the marker and be wrongly counted.
  • Multiple concurrent initiators in Chandy-Lamport: each initiator runs its own snapshot; markers can collide; multiple snapshots are recorded; pick one or merge.
  • Acharya-Badrinath with non-causal channels breaks: assumes monotonic counter delivery.
  • Lai-Yang heavy storage — for long-running snapshots, history can grow unboundedly. Snapshot must complete promptly.
  • Initiator failure during snapshot leaves the system without an aggregator. Recovery: re-run; some algorithms tolerate this with re-snapshot.

Common mistakes

  • Saying 'a snapshot captures the exact state at one instant'. No — it captures a *consistent* state the system COULD have been in, not necessarily the actual instantaneous state.
  • Chandy-Lamport without stating FIFO. Always state the channel assumption — it's typically a 1-mark sub-part.
  • Forgetting that the initiator records its own state first in Chandy-Lamport (before sending markers).
  • Confusing C1 and C2. C1: send recorded ⇒ msg present (in channel OR received). C2: send NOT recorded ⇒ msg absent.
  • Lai-Yang on FIFO channels. Inefficient — Chandy-Lamport is lighter. Lai-Yang's value is non-FIFO support.

Shortcuts

  • Chandy-Lamport: FIFO, markers, msgs, time.
  • Lai-Yang: non-FIFO, white/red colouring, heavy history.
  • Acharya-Badrinath: causal, msgs, SENT/RECD counters.
  • Consistent cut: no message arrow goes future → past.
  • C1 (conservation), C2 (cause-effect).
  • Banking trap: if total ≠ initial total, cut is inconsistent.

Proofs / Algorithms

C1 from Chandy-Lamport. Suppose is recorded in 's snapshot. Two subcases. (a) is delivered to before records its state — then is in 's state ('received'). (b) is delivered after 's state record but before the marker on the channel — then by the algorithm is recorded as channel state. In either case the message is accounted for ⇒ C1 holds.

C2 from Chandy-Lamport (FIFO-dependent). Suppose is NOT recorded — meaning happened after 's state record. By the algorithm, 's marker was sent before any other message on that channel (after recording). By FIFO, arrives at AFTER the marker. So has either already stopped recording the channel (then is excluded) or never received the marker yet (then 's state hasn't been recorded). In neither case is in the snapshot ⇒ C2 holds.