Computer Vision
CSE471Prof. Makarand Tapaswi + Prof. Charu Sharma•Spring 2025-26•4 credits
Unit 6 — Attention & Transformers
Why attention beats RNN bottlenecks, scaled dot-product attention with the √dₖ rationale, multi-head attention, encoder/decoder masking, positional encodings, and the Show-Attend-and-Tell precursor.