2 of 42

Logistics

Assignment 4 (A4) is due on Wednesday, 2/19
Project Checkpoint 3 is due on Monday, 3/03

3 of 42

Agenda

Machine Translation�The Noisy Channel Model

The IBM Model

Intro to Neural Machine Translation

4 of 42

NLP Task: Machine Translation

Mr President , Noah's ark was filled not with production factors , but with living creatures .

(From Language X)

Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .

(To Language Y)

5 of 42

The Noisy Channel Model

Language X Language Y

We want to translate Language X into Language Y.

6 of 42

The Noisy Channel Model

Source

Channel

Language Y

Language X

Language X Language Y

We want to translate Language X into Language Y.

Imagine there is a source that generates Language Y. But then it is passed through some channel, and we observe Language X on the other side of the channel.

7 of 42

The Noisy Channel Model

Source

Channel

Language Y

Language X

Imagine there is a source that generates Language Y. But then it is passed through some channel, and we observe Language X on the other side of the channel.

y^∗= argmax p(y | x)

= argmax_yp(x | y) ·

p(y)

Source model aka a LM for Language Y! This captures the fluency in the target language.

8 of 42

The Noisy Channel Model

Source

Channel

Language Y

Language X

Imagine there is a source that generates Language Y. But then it is passed through some channel, and we observe Language X on the other side of the channel.

y^∗= argmax p(y | x)

= argmax_y

p(x | y)

· p(y)

Channel model, captures the faithfulness of the translation.

9 of 42

The Noisy Channel Model

Source

Channel

Language Y

Language X

Language Y

Refer to this when you get lost which is which!

10 of 42

IBM Model 1 - Motivation

Mr President , Noah's ark was filled not with production factors , but with living creatures .

Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .

IBM Model 1: What is the mapping from each token in Language X to Language Y?

Source

Channel

Language Y

Language X

Language Y

11 of 42

IBM Model 1 - Alignment

IBM Model 1: What is the mapping from each token in Language X to Language Y?

Let l be the length of y and m be the length of x.

Latent variable a = ⟨a₁,...,a_m⟩, each a_iranging over {0,...,l} (positions in y).

Source

Channel

Language Y

Language X

Language Y

a_i= j

12 of 42

IBM Model 1 - Alignment

IBM Model 1: What is the mapping from each token in Language X to Language Y?

Let l be the length of y and m be the length of x.

Latent variable a = ⟨a₁,...,a_m⟩, each a_iranging over {0,...,l} (positions in y).

Source

Channel

Language Y

Language X

Language Y

a_i=

The ith token in Language X.

The jth token in Language Y.

13 of 42

IBM Model 1

Source

Channel

Language Y

Language X

Language Y

Mr President , Noah's ark was filled not with production factors , but with living creatures .

Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .

IBM Model 1: What is the mapping from each token in Language X to Language Y?

a_i= j

14 of 42

IBM Model 1

Source

Channel

Language Y

Language X

Language Y

Mr President , Noah's ark was filled not with production factors , but with living creatures .

Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .

IBM Model 1: What is the mapping from each token in Language X to Language Y?

a_i= j

1 2 3 4

a = [0, 0, 0, 1,

]

???

15 of 42

IBM Model 1

Source

Channel

Language Y

Language X

Language Y

Mr President , Noah's ark was filled not with production factors , but with living creatures .

Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .

IBM Model 1: What is the mapping from each token in Language X to Language Y?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

a = [0, 0, 0, 1, 2, 3, 5, 4, 0, 6, 6, 7, 8, 0, 0, 9, 10]

a_i= j

16 of 42

IBM Model 1

Source

Channel

Language Y

Language X

Language Y

a_i= j

Our channel model:

17 of 42

IBM Model 1

Source

Channel

Language Y

Language X

Language Y

a_i= j

Our channel model:

Marginalized over all possible a vectors.

IBM Model 1

Source

Channel

Language Y

Language X

Language Y

Mr President , Noah's ark was filled not with production factors , but with living creatures .

Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .

25 of 42

IBM Model 1 - Learning

Source

Channel

Language Y

Language X

Language Y

26 of 42

IBM Model 1 - Learning

Source

Channel

Language Y

Language X

Language Y

How do we estimate this?

27 of 42

IBM Model 1 - Learning

Source

Channel

Language Y

Language X

Language Y

The problem: we don’t know the alignments ahead of time. So we can’t apply MLE to find the parameter.

The solution: expectation maximization.

28 of 42

Neural Machine Translation (NMT)

Based on new model archetype: seq-to-seq or encoder-decoder
High-level model:

The model has two parts:

Encoder that takes in the source language sentence f and outputs an encoding of the sentence encode(f)
Decoder that at step j predicts the target language word e_jfrom the previously output target language words e_<jand encode(f)

41 of 42

Final notes on NMT

To decode (get a translated sentence from the MT model), we can use methods discussed for previous sequence labeling tasks: greedy decoding, beam search, etc.
We show how to use the encoder-decoder model for MT, but this is a general setup that works:

For many different NLP tasks
With different NN architectures (RNNs, Transformers)

1 of 42

2 of 42

3 of 42

4 of 42

5 of 42

6 of 42

7 of 42

8 of 42

9 of 42

10 of 42

11 of 42

12 of 42

13 of 42

14 of 42

15 of 42

16 of 42

17 of 42

18 of 42

19 of 42

20 of 42

21 of 42

22 of 42

23 of 42

24 of 42

25 of 42

26 of 42

27 of 42

28 of 42

29 of 42

30 of 42

31 of 42

32 of 42

33 of 42

34 of 42

35 of 42

36 of 42

37 of 42

38 of 42

39 of 42

40 of 42

41 of 42

42 of 42