- Limitations of RNNs and motivation for Transformers
- Self-attention mechanism
i) Query, Key, Value concepts
ii) Scaled dot-product attention
iii) Multi-head attention
- Transformer architecture
i) Encoder-decoder structure
ii) Positional encoding
iii) Feed-forward networks
iv) Layer normalization and residual connections
- Pre-trained Transformer models
i) BERT (Bidirectional Encoder Representations from Transformers)
ii) GPT (Generative Pre-trained Transformer)
iii) T5 (Text-to-Text Transfer Transformer)
- Transfer learning and fine-tuning
- Hugging Face Transformers library
- Applications: Text classification, NER, question answering, summarization
Vision Transformers (ViT) - brief introduction