Saral Shiksha Yojna
Courses/Computer Vision

Computer Vision

CSE471
Prof. Makarand Tapaswi + Prof. Charu SharmaSpring 2025-264 credits
Revision Notes/Unit 7 — Vision Transformers (ViT)

Unit 7 — Vision Transformers (ViT)

Image-as-tokens: patchify, project, prepend a [CLS], add positional embeddings, run a Transformer encoder, classify. ViT scales beautifully but needs massive data; Swin localises attention for efficiency.