Learning Systems II
Sebastian Gottwald, Ulm University
Instance-based Learning (slides, pdf, code)
- Motivation
- Reproducing Kernel Hilbert Spaces
- Kernel machines
- Linear Support Vector Machine: Definition (slide 26)
- Constrained to unconstrained optimization (slide 27)
- Duality in constrained optimization (slide 28)
- KKT conditions (slide 29)
- Linear SVM: Dual Problem (slide 30)
- Linear SVM: Solution (slide 31)
- Nonlinear SVM (slide 32)
- Extensions (slide 33)
- Kernel trick (slide 34)
- Other methods (slide 35)
Learning in Graphical Models (slides, pdf, code)
- Recap: Probability calculus
- Probability distributions (slide 4)
- Probability measure (slide 6)
- Random experiments (slide 7)
- Events (slide 8)
- Probability space (slide 9)
- Random Variables (slide 10)
- Expectation (slide 14)
- Joint distributions (slide 17)
- Marginalization (slide 18)
- Conditioning (slide 19)
- Canonical factorization of the joint (slide 21)
- Statistical independence (slide 22)
- Directed graphs
- Graphs (slide 25)
- Representing joints graphically (slide 26)
- Conditional independence (d-separation) (slide 29)
- Markov property (slide 34)
- Inference (slide 38)
- Maximum likelihood (slide 39)
- Bayesian inference (slide 40)
- Variational inference (slide 41)
- General free energy minimization (slide 43)
- Approximate and iterative inference (slide 47)
- Undirected graphs
- Message passing
Learning with Attention (slides, pdf, code)
- Recurrent neural networks
- Recap: Learning feed-forward artificial neural networks (slide 4)
- Sequential data (slide 6)
- n-Gram models (slide 7)
- Basic structure of RNNs (slide 8)
- Single-layer RNNs (slide 9)
- Multilayer RNNs (slide 10)
- Example applications (slide 11)
- Variants of RNNs (slide 12)
- Exploding and vanishing gradients (slide 13)
- Encoder-decoder architectures
- Transformers
- Dot-product attention (slide 26)
- Query-key-value mechanism (slide 27)
- Vaswani et al. (2017): "Attention is all you need" (slide 29)
- Self attention (slide 30)
- Encoder-decoder attention (slide 32)
- Masked self attention (slide 33)
- Teacher forcing (slide 35)
- Additional techincal details about transformers (slide 37)
- Applications to other tasks (slide 39)
- Quadratic costs (slide 40)
- Vision Transformers
- Dosovitskiy et al. (2021): Vision transformer, ViT (slide 44)
- Touvron et al. (2021): Data-efficient image transformers, DeiT (slide 46)
- Most useful augmentations for image transformers (slide 47)
- Liu et al. (2021): Swin Transformer (slide 49)
- Hierarchical structure of Swin transformers (slide 50)
- Local windowed self-attention (slide 51)