Saral Shiksha Yojna
Courses/Computer Vision

Computer Vision

CSE471
Prof. Makarand Tapaswi + Prof. Charu SharmaSpring 2025-264 credits
Sample Papers/Mock Paper 16 — Full Exam Simulation (mixed format, dress rehearsal)

Mock Paper 16 — Full Exam Simulation (mixed format, dress rehearsal)

Duration: 180 min • Max marks: 100

Section A — Short Answer (30 marks; suggested 45 min)

30 marks
  1. 1.Sobel kernel [[-1,0,1],[-2,0,2],[-1,0,1]] — response on a uniform region (all pixels 50)?2 m
  2. 2.Input 224×224×3; 7×7 conv with 64 filters, stride 2, padding 3. (a) Output spatial size. (b) Number of parameters.3 m
  3. 3.Otsu's optimisation criterion — what does it return?3 m
  4. 4.Closing — erosion or dilation first? What does it eliminate?2 m
  5. 5.Faster R-CNN: anchors per spatial position? What two parameters of each anchor are learned?3 m
  6. 6.Write YOLO's box-size loss component (with √w, √h).2 m
  7. 7.Compute IoU of predicted (10, 10, 60, 60) and GT (30, 20, 80, 70).3 m
  8. 8.RoI Pool vs RoI Align — key difference in one sentence.2 m
  9. 9.OpenPose output: K + 2L channels. For 17 keypoints and 19 limbs — how many channels? Why 2L for limbs?3 m
  10. 10.Scaled dot-product attention — why divide by √d_k?2 m
  11. 11.DINO's two anti-collapse mechanisms — name each and the failure mode it prevents.3 m
  12. 12.3DGS Σ = R·S·Sᵀ·Rᵀ — what does this guarantee?2 m

Section B — Conceptual & Calculations (40 marks; suggested 75 min)

40 marks
  1. 1.CNN backbone outputs P2 (stride 4), P3 (stride 8), P4 (stride 16), P5 (stride 32). Design an FPN. Why useful for detection?5 m
  2. 2.Compare Dice, CE, and Focal loss for medical segmentation where foreground (tumour) is 2% of pixels.6 m
  3. 3.Explain ViT's CLS token. Why needed? Why a single CLS vs averaging all patch tokens?5 m
  4. 4.1080p (1920×1080) image into vanilla ViT patch 16. (a) Tokens with CLS. (b) Attention complexity per layer. (c) Practical?5 m
  5. 5.Compare contrastive (CLIP), self-distillation (DINO), masked reconstruction (MAE) on: (a) what each predicts, (b) supervision source, (c) best off-the-shelf use.6 m
  6. 6.Back-propagation through Max-Pooling. Input [[1, 3], [5, 2]] with 2×2 max pool (one output). Upstream grad = 4. Compute backward gradient.6 m
  7. 7.3D Gaussian with position (1,2,3), scale (0.5, 1.0, 0.3), opacity 0.7. Float32 storage with SH deg 3 and role of each parameter.5 m

Section C — Long Form (30 marks; suggested 60 min)

30 marks
  1. 1.Design a CV-based autonomous shopping cart that follows a customer and identifies items placed in cart. (a) Components. (b) Architecture + specific models. (c) Two failure modes + mitigations.10 m
  2. 2.Derive Smooth L1 loss (Fast/Faster R-CNN box regression). (a) State 3 losses. (b) Derive gradients. (c) When Smooth L1 preferred. (d) Sketch.10 m
  3. 3.Defend a new SSL method to a skeptical vision community. (a) 'We have CLIP, DINO, MAE, JEPA — what's missing?' (b) 'What benchmarks should we use?' (c) 'How to test if your method works in practice?'10 m

Track your attempt locally — score and time are recorded in your browser. (Coming soon: timed-attempt mode.)