Encoder Decoder / Attention/ Transformers /
1
10-Apr-23
Today Nov 4
2
10-Apr-23
Encoder-Decoder
Simple recurrent neural network illustrated as a feed-forward network
Most significant change: new set of weights, U
Simple-RNN abstraction
y2
y1
y3
RNN Applications
Sentence Completion using an RNN
Extending (autoregressive) generation to Machine Translation
there lived a hobbit vivait un hobbit
……..
there lived a hobbit <\s> vivait un hobbit <\s>
……..
word generated at each time step is conditioned on word from previous step.
Extending (autoregressive) generation to Machine Translation
(simple) Encoder Decoder Networks
Limiting design choices
General Encoder Decoder Networks
Abstracting away from these choices
h1
h1
h2
h2
hn
hm
Popular architectural choices: Encoder
Widely used encoder design: stacked Bi-LSTMs
Decoder Basic Design
Last hidden state of the encoder
First hidden state of the decoder
z1
z2
Decoder Design�Enhancement
Context available at each step of decoding
z1
z2
Decoder: How output y is chosen
z1
z2
For sequence labeling we used Viterbi – here not possible ☹
Today Nov 4
17
10-Apr-23
Sequence to Sequence Learning
Attention Model
Essentially the context vector consumes t pieces of information:
Flexible context: Attention
Context vector c: function of h1:n and conveys the essence of the input to the decoder.
h1
h1
h2
h2
hn
hm
Flexible?
Attention (1): dynamically derived context
Ideas:
Attention (2): computing ci
Attention (3): computing ci�From scores to weights
Attention: Summary
Encoder
Decoder
Explain Y. Goldberg different notation
Intro to Encoder-Decoder and Attention (Goldberg’s notation)
Encoder
Decoder