Courses/Computer Vision

Computer Vision

CSE471

Prof. Makarand Tapaswi + Prof. Charu Sharma•Spring 2025-26•4 credits

Sample Papers/Mock Paper 11 — Architectural Reasoning ('Why this design choice?')

Mock Paper 11 — Architectural Reasoning ('Why this design choice?')

Duration: 180 min • Max marks: 100

20 marks

1.Why does AlexNet use two parallel GPU paths in the original implementation?2 m
2.Why does VGG use stacked 3×3 convs instead of one 5×5 or 7×7?2 m
3.Why does ResNet use the identity shortcut rather than a learned transformation?2 m
4.Why does BatchNorm have learnable γ and β? Why re-scale and re-shift after normalising?2 m
5.Why does YOLO use a single network with grid output instead of two stages?2 m
6.Why does Faster R-CNN's RPN use anchors of multiple scales and ratios?2 m
7.Why does U-Net use CONCAT skips instead of ADD like ResNet?2 m
8.Why MAE 75% mask but BERT 15%?2 m
9.Why does DINO use multi-crop but SimCLR uses only two views?2 m
10.Why is stochastic depth used in deep ViTs but not in shallow CNNs?2 m

40 marks

1.Why use depthwise separable convolutions (MobileNet) instead of regular conv? Show savings with C_in=128, C_out=256, kernel 3×3.5 m
2.Why did Transformers switch from PostNorm to PreNorm?5 m
3.Why does CLIP use contrastive learning rather than generative captioning?5 m
4.Why does Mask R-CNN predict per-class masks rather than a single class-agnostic mask?5 m
5.Why does DINO use output dimension 65,536? Why not 1000 (matching ImageNet classes)?4 m
6.Why does 3DGS use Spherical Harmonics for colour rather than just storing RGB triplets per Gaussian?5 m
7.Why does PointNet use MAX pooling rather than SUM or MEAN?5 m
8.Why does the original ViT underperform CNNs on small datasets despite more parameters?4 m

40 marks

1.Compare design philosophies of CLIP, DINO, MAE: objective, inductive biases, why at scale, best downstream tasks.10 m
2.Why is the U-Net architecture so universally applicable (medical seg, Stable Diffusion, pix2pix, optical flow, inpainting)? What's the unifying property?10 m
3.List 8 modern transformer improvements over the original and what each solves.10 m
4.Why does 3D Gaussian Splatting work so much faster than NeRF (30 min vs 8-12 hr training)?10 m

Track your attempt locally — score and time are recorded in your browser. (Coming soon: timed-attempt mode.)