Saral Shiksha Yojna
Courses/Computer Vision

Computer Vision

CSE471
Prof. Makarand Tapaswi + Prof. Charu SharmaSpring 2025-264 credits
Sample Papers/Mock Paper 2 — Comprehensive (3 hr / 100 marks)

Mock Paper 2 — Comprehensive (3 hr / 100 marks)

Duration: 180 min • Max marks: 100

Section A — Short Answer / MCQ (1-2 marks each, 20 marks)

20 marks
  1. 1.Apply the 3×3 Sobel-X kernel to a 3×3 patch [50 100 150; 50 100 150; 50 100 150]. What is the response at the centre?1 m
  2. 2.What is the DC component of a Fourier-transformed image, and how is it computed from the spatial domain?1 m
  3. 3.Define translation equivariance. Why are CNNs equivariant and vanilla MLPs not?2 m
  4. 4.What does Faster R-CNN's RPN 'objectness' score predict?1 m
  5. 5.5×5 kernel applied to 7×7 image, stride 2, padding 0. Compute output dimensions and the receptive field of one output pixel.2 m
  6. 6.In CLIP with batch 256, how many positive and negative (image, text) pairs?1 m
  7. 7.Why do VGG / ResNet prefer 3×3 kernels over 5×5 or 7×7?2 m
  8. 8.Which single architectural innovation enabled ResNet to train networks up to 152 layers deep?1 m
  9. 9.Write IoU and Dice formulas. State their relationship and where their numerical values diverge most.2 m
  10. 10.Time complexity of standard self-attention for sequence length N and feature dim D?1 m
  11. 11.What does BatchNorm solve, and how do training-mode and inference-mode behaviour differ?2 m
  12. 12.In Gaussian Splatting, what does Σ control per Gaussian, and how is it parameterised to remain valid?1 m
  13. 13.What does it mean for DGCNN's kNN graph to be 'dynamic', and why does this beat PointNet?2 m
  14. 14.What is the asymmetric teacher/student input setup in DINO multi-crop?1 m

Section B — Conceptual / Explanation (4-6 marks each, 40 marks)

40 marks
  1. 1.Explain the Hit-or-Miss transform. Provide a worked example using two 3×3 structuring elements to detect isolated foreground points.5 m
  2. 2.Describe CPM's multi-stage architecture for pose estimation. Why apply loss at every stage instead of only the final stage?4 m
  3. 3.ResNet-50 pretrained on ImageNet. (A) Classify 5,000 medical X-rays into 3 classes. (B) Classify 200 photos of dog vs cat. Feature extraction or fine-tuning, which layers, and why?6 m
  4. 4.Apply Otsu's thresholding conceptually to a bimodal 8-bin histogram (low peak 0-63 ≈ 1800 pixels, valley 96-159 ≈ 80 pixels, high peak 192-255 ≈ 1900 pixels). What threshold range will Otsu pick and why?5 m
  5. 5.Explain Mask R-CNN's three output heads per RoI and the loss function. What is RoI Align's role?5 m
  6. 6.Why does CLIP achieve strong zero-shot classification, and what is the procedure for classifying an unseen image into a custom set of classes?5 m
  7. 7.Compute parameters in a Transformer block with D = 512, FFN expansion 4×, 8 heads.5 m
  8. 8.What are Part Affinity Fields in OpenPose? How is the PAF score for a candidate pair computed, and how is it used to assemble person skeletons?5 m

Section C — Long-Form / Calculation (10 marks each, 40 marks)

40 marks
  1. 1.NMS trace. 6 boxes with scores A(0.95), B(0.88), C(0.85), D(0.72), E(0.65), F(0.55). Pairwise IoUs: A-B 0.7, A-C 0.2, A-D 0.1, A-E 0.4, A-F 0.1, B-C 0.6, B-D 0.2, B-E 0.3, B-F 0.1, C-D 0.6, C-E 0.1, C-F 0.2, D-E 0.3, D-F 0.1, E-F 0.7. Threshold 0.5. (a) Step-by-step trace. (b) Final kept and suppressed lists. (c) Why per-class NMS?10 m
  2. 2.Backpropagation through a 3×3 conv with single input/output channel. Input x is 4×4, kernel w is 3×3, stride 1, no padding. (a) Formulas for dL/dw and dL/dx. (b) Compute forward output y for x = [1..16] row-major, w = [1 0 -1; 1 0 -1; 1 0 -1]. (c) Compute dL/dw given dL/dy = all-ones 2×2.10 m
  3. 3.Architecture design for 24/7 CCTV at 1080p, 30 FPS, RTX 3090, crowds up to 50 people. (a) YOLO + simple seg head vs Mask R-CNN + FPN. (b) Top-down vs bottom-up pose. (c) Pretraining + augmentation strategy.10 m
  4. 4.SSL deep comparison. Engineer A says DINOv2 best; Engineer B says MAE best; Engineer C says CLIP best. (a) Strongest argument each. (b) Most important weakness each. (c) Recommend one method per task: (i) open-vocab classification of unseen plant species; (ii) fine-grained urban scene segmentation; (iii) generic backbone for ~100-label downstream classification.10 m

Track your attempt locally — score and time are recorded in your browser. (Coming soon: timed-attempt mode.)