Saral Shiksha Yojnaby IIIT-H alumni

Courses/Computer Vision

Computer Vision

CSE471

Prof. Makarand Tapaswi + Prof. Charu Sharma•Spring 2025-26•4 credits

Last-Week Revision Pack

Every item below is something you should be able to recall cold by exam morning. It is not a study list — it is a triage list for the final 5–7 days.

How to use this page (last 5–7 days)

Day −7 to −3:Read every item top-to-bottom. For any item you can't expand into a 2-minute explanation, open the linked unit/chapterand re-learn it. Don't move on until you can recall the item without looking.
Day −2 to −1: Re-read only — speak each item aloud. If you stumble, mark it mentally and drill it twice more.
Exam morning:Skim once, fast. Don't deep-dive anything. The goal is retrieval priming, not learning.

What each tag means and what to do with it

CheatsheetA pointer to a fast-skim page in this course. Open it and re-read in the order suggested. 2–5 minutes per item.
High YieldA topic almost certain to appear on the exam. Allocate revision time proportional to its expected mark weight — not equal time per item. Drill until you can answer without notes.
Weak AreaA topic the cohort historically struggles with. Treat as high-priority and verify your understanding by explaining it aloud or writing a one-paragraph answer.
FormulaAn equation you must reproduce verbatim. Write it out from memory once per day until exam day. If you can't derive it, also re-read the relevant chapter.
Memory TriggerA "if you see X → reach for Y" cue for the exam room. Memorise the mapping; you'll only have seconds to recall it under pressure. Pair with the linked framework.
DerivationA multi-step proof or derivation. Write it from blank paper — not just read. Re-do until you can produce it in under 5 minutes.
Common MistakeA specific error the cohort routinely makes. Memorise the correction and the right phrasing — this is the cheapest mark you can save.

Track your progress:mark the page "Finished" (top-right) once you can recall every item below without looking.

The pack (54 items)

1 Cheatsheet · 17 High Yield · 12 Memory Trigger · 8 Formula · 4 Derivation · 5 Weak Area · 7 Common Mistake

CheatsheetRe-skim the cheatsheet pages in this order: detection metrics → segmentation losses → attention/ViT → SSL (DINO especially) → 3DGS parameters.High YieldThree Rs (Malik): Reorganisation / Recognition / Reconstruction. Give one CV example for each — autonomous driving uses all three.High YieldWhy CV hard (seven reasons): pixels-as-numbers, intra-class variation, viewpoint, illumination, occlusion, scale, ambiguity.High YieldSmoothing kernels sum to 1; derivative kernels sum to 0. Edge ⊥ gradient. Gaussian is SEPARABLE (O(2K) not O(K²)).High YieldJPEG uses DCT not DFT — real-valued, no boundary discontinuity, better energy compaction.High YieldOtsu MAXIMISES between-class variance (equivalently minimises within). Closed-form

O (L)

sweep.High YieldLogistic-regression SGD:

w \leftarrow w - η (\overset{y}{^} - y) x

. Why not MSE? Non-convex + vanishing gradient with sigmoid.High YieldBagging reduces variance (RF). Boosting reduces bias (AdaBoost / Viola-Jones face detection).High YieldPrecision when FP costly; Recall when FN costly. F1 = 2PR/(P+R). PR > ROC for imbalanced data.High YieldCNN conv params

F (C_{in} K^{2} + 1)

— independent of

H, W

⌊(W + 2 P - K) / S ⌋ + 1

.High YieldTwo stacked 3×3 < one 5×5 (params + extra ReLU). Same RF.High YieldResNet residual gradient

\partial F / \partial x + I

⇒ no vanishing. AlexNet 60M, VGG-19 143.7M, ResNet-50 25.6M, EfficientNet-B0 5.3M.High YieldDepthwise-separable conv: ratio

1/ C_{out} + 1/ K^{2}

\sim 9 \times

cheaper. CNN translation EQUIvariant, NOT rotation equivariant.Memory TriggerIf question mentions 'salt-and-pepper noise' → median filter (rank statistic; mean smears it).Memory TriggerIf 'edge-preserving smoothing' → bilateral filter (spatial × range Gaussian).Memory TriggerIf 'uneven illumination + thresholding' → adaptive / Sauvola / Niblack; global Otsu fails.Memory TriggerIf 'why not zero-init NN weights' → symmetry (all neurons stay identical forever).Memory TriggerIf 'deep CNN doesn't train' (>20 layers) → residual connections (ResNet).FormulaConv output:

⌊(W + 2 P - K) / S ⌋ + 1

F (C_{in} K^{2} + 1)

. Same pad odd K:

P = (K - 1) /2

.FormulaReceptive field of

L

3 \times 3

stride-1 convs:

2 L + 1

.FormulaHistogram equalisation:

s = (L - 1) CDF (r)

. Maps any histogram to ≈ uniform.FormulaPrecision / Recall / F1:

\frac{T P}{T P + F P}

\frac{T P}{T P + F N}

, harmonic mean

\frac{2 P R}{P + R}

.DerivationLogistic gradient =

(\overset{y}{^} - y) x

via sigmoid×CE cancellation — derive it once cleanly.DerivationResNet skip prevents vanishing:

\partial y / \partial x = \partial F / \partial x + I

⇒ gradient norm ≥ 1 through skip.Weak AreaPCA in 4 steps: centre → covariance → eigendecompose → keep top-k. SVD numerically stable.Weak AreaEM for GMM — practise the

γ_{ik}

formula and the weighted-MLE M-step updates.Common MistakeDon't say 'erosion = MAX'. Erosion = MIN (shrink); Dilation = MAX (grow).Common MistakeDon't compute conv params depending on

H, W

— they depend on

F, C_{in}, K

only.Common MistakeDon't use forward warping. Inverse warping (with interpolation) avoids holes.High YieldObject detection (R-CNN family + YOLO + NMS + mAP) — heaviest single topic, plan ≥ 25% of revision time here.High YieldViT pipeline + parameter count — almost guaranteed appearance, easy marks if memorised.High YieldDINO centering + sharpening + multi-crop — the one SSL question most students get wrong.High Yield3DGS: 59 params per Gaussian (3+7+1+48), Σ = R·S·Sᵀ·Rᵀ, alpha compositing equation.High YieldModern Transformer upgrades (PreNorm, RMSNorm, LayerScale, QK-Norm, Registers, RoPE, GQA, Flash, KV-cache).Memory TriggerIf the question says 'imbalanced classes' → focal loss / Dice / weighted CE.Memory TriggerIf the question says 'softmax saturation' → √dₖ scaling argument.Memory TriggerIf the question says 'permutation invariant' → shared MLP + symmetric pool (PointNet pattern).Memory TriggerIf the question says 'model collapse in SSL' → DINO centering + sharpening + EMA teacher + large output dim.Memory TriggerIf the question says 'sub-pixel alignment' → bilinear interpolation, RoI Align beats RoI Pool.Memory TriggerIf the question says 'edge artifacts' → DCT (not DFT) because no periodicity assumption.Weak AreaWalk through PAF score formula (line integral) once more — students typically can't write it cold.Weak AreaPre-derive the 86 M parameter count for ViT-B/16 on paper to gain speed in the exam.Weak AreaMAE 75% vs BERT 15% rationale — image spatial redundancy argument.FormulaConv output: (W − F + 2P)/S + 1. Same-pad odd K: P = (F − 1)/2.FormulaAttention: softmax(QKᵀ/√dₖ) V. √dₖ normalisation reasoning is examinable.FormulaDice = 2·IoU/(1 + IoU). Denominator is SUM, not union.FormulaPSNR = 10 log₁₀(R²/MSE); R = 255 for 8-bit.DerivationBe ready to derive RoPE's relative-position property: rotate q by m·θ, k by n·θ → dot product is f(m − n).DerivationInfoNCE ↔ cross-entropy: with one positive in the denominator, NT-Xent is exactly softmax cross-entropy.Common MistakeDon't forget per-class NMS (not global).Common MistakeDon't confuse Dice union with sum — Dice has + in the denominator, IoU has ∪.Common MistakeDon't write 'YOLO predicts B class probabilities' — class probs are SHARED across the B boxes per cell.Common MistakeBatchNorm at inference: use running averages, NOT batch statistics.Memory TriggerWatch-clock-style multimodal PYQ — image + structured text → output. Frame as Prefix-LM VLM; bidirectional attention on the prompt, causal on the answer.