Saral Shiksha Yojna
Courses/Computer Vision

Computer Vision

CSE471
Prof. Makarand Tapaswi + Prof. Charu SharmaSpring 2025-264 credits

Last-Week Revision Pack

The final condensed layer. Open this on exam morning.

CheatsheetRe-skim the cheatsheet pages in this order: detection metrics → segmentation losses → attention/ViT → SSL (DINO especially) → 3DGS parameters.
High YieldObject detection (R-CNN family + YOLO + NMS + mAP) — heaviest single topic, plan ≥ 25% of revision time here.
High YieldViT pipeline + parameter count — almost guaranteed appearance, easy marks if memorised.
High YieldDINO centering + sharpening + multi-crop — the one SSL question most students get wrong.
High Yield3DGS: 59 params per Gaussian (3+7+1+48), Σ = R·S·Sᵀ·Rᵀ, alpha compositing equation.
High YieldModern Transformer upgrades (PreNorm, RMSNorm, LayerScale, QK-Norm, Registers, RoPE, GQA, Flash, KV-cache).
Memory TriggerIf the question says 'imbalanced classes' → focal loss / Dice / weighted CE.
Memory TriggerIf the question says 'softmax saturation' → √dₖ scaling argument.
Memory TriggerIf the question says 'permutation invariant' → shared MLP + symmetric pool (PointNet pattern).
Memory TriggerIf the question says 'model collapse in SSL' → DINO centering + sharpening + EMA teacher + large output dim.
Memory TriggerIf the question says 'sub-pixel alignment' → bilinear interpolation, RoI Align beats RoI Pool.
Memory TriggerIf the question says 'edge artifacts' → DCT (not DFT) because no periodicity assumption.
Weak AreaWalk through PAF score formula (line integral) once more — students typically can't write it cold.
Weak AreaPre-derive the 86 M parameter count for ViT-B/16 on paper to gain speed in the exam.
Weak AreaMAE 75% vs BERT 15% rationale — image spatial redundancy argument.
FormulaConv output: (W − F + 2P)/S + 1. Same-pad odd K: P = (F − 1)/2.
FormulaAttention: softmax(QKᵀ/√dₖ) V. √dₖ normalisation reasoning is examinable.
FormulaDice = 2·IoU/(1 + IoU). Denominator is SUM, not union.
FormulaPSNR = 10 log₁₀(R²/MSE); R = 255 for 8-bit.
DerivationBe ready to derive RoPE's relative-position property: rotate q by m·θ, k by n·θ → dot product is f(m − n).
DerivationInfoNCE ↔ cross-entropy: with one positive in the denominator, NT-Xent is exactly softmax cross-entropy.
Common MistakeDon't forget per-class NMS (not global).
Common MistakeDon't confuse Dice union with sum — Dice has + in the denominator, IoU has ∪.
Common MistakeDon't write 'YOLO predicts B class probabilities' — class probs are SHARED across the B boxes per cell.
Common MistakeBatchNorm at inference: use running averages, NOT batch statistics.
Memory TriggerWatch-clock-style multimodal PYQ — image + structured text → output. Frame as Prefix-LM VLM; bidirectional attention on the prompt, causal on the answer.