MSAIL Sign-In!
https://forms.gle/6mM42WjKts18Vvta8
Attention and Transformers
MSAIL Reading Group
3/15/2022
Nisreen Bahrainwala
Agenda
What is Machine Translation?
Human Translation
Machine Translation
Where it all started: Recurrent Neural Networks (RNNs)
Encoder
Decoder
Encoder - Decoder Model with RNNs
End result is a vector with all the information from all the hidden states
End result is a vector that has decomposed the joint probability into ordered conditionals
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, Cho, Bengio - ICLR 2015
Encoder
Decoder
Encoder and Decoder (new and improved)
- Same as RNN, bi-directional (unrolls the sentence forward and backwards)
- Context vector
- Annotations
- contains information about the words surrounding the ith word
- key difference, probability is conditioned on a distinct context vector c_i for each target word y_i
General Outline of Attention
Basic Dot Product
General Intuition
“Given a set of vector values, and a vector query, attention is a technique to compute a weighted sum of the values,
dependent on the query”
“Query attends to the values”
In the example:
Each decoder hidden state (query) attends to all the encoder hidden states (values).
Attention is All You Need
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin
2017
Main Ideas
Key Idea: No more RNNs
Encoder:
Decoder:
Self Attention - Overview
Multi-head Attention