Saral Shiksha Yojna
Courses/Computer Vision

Computer Vision

CSE471
Prof. Makarand Tapaswi + Prof. Charu SharmaSpring 2025-264 credits

True / False (with reasoning)

Exposes shallow understanding. Always include the reason.

YOLO predicts a separate class distribution for each of the B bounding boxes within a grid cell.

NMS is applied globally across all classes simultaneously.

RoI Align replaces RoI Pool's quantization with bilinear interpolation, yielding sub-pixel alignment.

PAFs encode the LENGTH of each limb at every pixel.

PointNet++ uses kNN in feature space to define local neighbourhoods.

3DGS has learnable weights that generalise across scenes.

Without positional encoding, a Transformer treats input tokens as a set rather than a sequence.

Doubling the number of attention heads (at fixed d_model) roughly doubles the parameter count.

SimCLR's projection head is kept for downstream tasks.

DINO requires negative samples to avoid collapse.

PreNorm and PostNorm are mathematically equivalent.

SigLIP's loss is computed independently per pair and does not require batch-wide synchronisation.

Two-Stream networks' temporal stream takes optical flow as input.