Saral Shiksha Yojna
Courses/Computer Vision

Computer Vision

CSE471
Prof. Makarand Tapaswi + Prof. Charu SharmaSpring 2025-264 credits

High-Yield Topics

If time is short, study these first.

R-CNN family evolution (R-CNN → Fast → Faster)
10-15 marksmedium
Appears almost every year; sets up RPN, RoI Pool/Align, anchor design.
YOLO v1 loss decomposition + grid output S×S×(B·5+C)
8-12 marksmedium
Single-shot pipeline is high-yield; the 5 loss terms and λ_coord/λ_noobj are quiz favorites.
NMS algorithm trace + soft-NMS variant
5-8 markseasy
Often asked as algorithm trace — write it step-by-step.
mAP construction (PR curve, AP per class, COCO vs VOC)
8-12 marksmedium
Computational question; show working.
RoI Pool vs RoI Align (quantization vs bilinear)
5-8 markseasy
Direct exam favorite; one-line difference → full marks.
U-Net concat skip vs ResNet add skip
5-8 markseasy
Foundational for diffusion/dense prediction.
Heatmap regression vs coordinate regression for pose
8-10 marksmedium
Conceptual: why dense output beats vector regression.
Part Affinity Fields (OpenPose) — grouping via line integral + Hungarian
10-15 markshard
Heavy pose question; many students fumble PAF score formula.
PointNet permutation invariance via shared MLP + symmetric max-pool
10-12 marksmedium
Universal approximation argument is testable.
3DGS per-Gaussian parameter count (59) + Σ = R·S·Sᵀ·Rᵀ
8-12 marksmedium
Concrete numerical answer; PSD enforcement is the trap.
Scaled dot-product attention + √dₖ rationale
5-8 markseasy
Variance argument is examinable as a short derivation.
ViT pipeline end-to-end (patchify → CLS → PE → encoder → MLP)
10-15 marksmedium
Always asked. Combine with parameter count of ViT-B/16 (~86M).
DINO anti-collapse: centering + sharpening
8-12 marksmedium
Multi-mark question — list both and explain the balance.
MAE 75% masking + asymmetric encoder/decoder
6-10 markseasy
Compare to BERT 15% — redundancy in images motivates aggressive mask.
Modern Transformer upgrade list (PreNorm, RMSNorm, RoPE, GQA, Flash, Registers)
10-15 marksmedium
Enumeration question — list and explain in one line each.
PaliGemma 3-pillar VLM blueprint + Prefix-LM masking
10-12 marksmedium
Architecture diagram question. Bidirectional prefix vs causal suffix.
M-RoPE (T × H × W) for video — why 1D fails
6-10 marksmedium
Multimodal positional encoding; appears in modern VLM questions.
I3D inflation trick — 2D ImageNet weights → 3D kernels
5-8 markseasy
Specific trick; easy to articulate.
TimeSformer divided space-time attention factorization
8-12 marksmedium
Compare the 4 attention factorizations; divided wins.
GIoU vs IoU loss (gradient on non-overlapping boxes)
5-8 markseasy
Why pure IoU fails — gradient argument.
Focal loss for class imbalance in single-stage detectors
5-8 markseasy
(1-pₜ)^γ modulating factor; γ=2 typical.
SMPL: β (10 shape) + θ (72 pose) → 6890-vertex mesh
8-10 marksmedium
Parameter accounting is testable; HMR pipeline.
Adaptive Density Control in 3DGS (clone / split / prune)
8-10 marksmedium
Why gradient signals densification.
CLIP zero-shot via 'a photo of a {class}'
5-8 markseasy
Prompt construction; cosine similarity argmax.