Machine Translation
CSE 447 / 517
Feb 13TH, 2025 (WEEK 6)
Logistics
Agenda
The IBM Model
NLP Task: Machine Translation
Mr President , Noah's ark was filled not with production factors , but with living creatures .
(From Language X)
Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .
(To Language Y)
The Noisy Channel Model
Language X Language Y
We want to translate Language X into Language Y.
The Noisy Channel Model
Source
Channel
Language Y
Language X
Language X Language Y
We want to translate Language X into Language Y.
Imagine there is a source that generates Language Y. But then it is passed through some channel, and we observe Language X on the other side of the channel.
The Noisy Channel Model
Source
Channel
Language Y
Language X
Imagine there is a source that generates Language Y. But then it is passed through some channel, and we observe Language X on the other side of the channel.
y∗ = argmax p(y | x)
y
= argmaxy p(x | y) ·
p(y)
Source model aka a LM for Language Y! This captures the fluency in the target language.
The Noisy Channel Model
Source
Channel
Language Y
Language X
Imagine there is a source that generates Language Y. But then it is passed through some channel, and we observe Language X on the other side of the channel.
y∗ = argmax p(y | x)
y
= argmaxy
p(x | y)
· p(y)
Channel model, captures the faithfulness of the translation.
The Noisy Channel Model
Source
Channel
Language Y
Language X
Language X
Language Y
Refer to this when you get lost which is which!
IBM Model 1 - Motivation
Mr President , Noah's ark was filled not with production factors , but with living creatures .
Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .
IBM Model 1: What is the mapping from each token in Language X to Language Y?
Source
Channel
Language Y
Language X
Language X
Language Y
IBM Model 1 - Alignment
IBM Model 1: What is the mapping from each token in Language X to Language Y?
Let l be the length of y and m be the length of x.
Latent variable a = ⟨a1,...,am⟩, each ai ranging over {0,...,l} (positions in y).
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
IBM Model 1 - Alignment
IBM Model 1: What is the mapping from each token in Language X to Language Y?
Let l be the length of y and m be the length of x.
Latent variable a = ⟨a1,...,am⟩, each ai ranging over {0,...,l} (positions in y).
Source
Channel
Language Y
Language X
Language X
Language Y
ai =
j
The ith token in Language X.
The jth token in Language Y.
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
Mr President , Noah's ark was filled not with production factors , but with living creatures .
Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .
IBM Model 1: What is the mapping from each token in Language X to Language Y?
ai = j
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
Mr President , Noah's ark was filled not with production factors , but with living creatures .
Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .
IBM Model 1: What is the mapping from each token in Language X to Language Y?
ai = j
1 2 3 4
a = [0, 0, 0, 1,
]
???
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
Mr President , Noah's ark was filled not with production factors , but with living creatures .
Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .
IBM Model 1: What is the mapping from each token in Language X to Language Y?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
a = [0, 0, 0, 1, 2, 3, 5, 4, 0, 6, 6, 7, 8, 0, 0, 9, 10]
ai = j
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
Marginalized over all possible a vectors.
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
where
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
where
Go through every position in x.
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
where
How likely is the current alignment without regard to the text?
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
where
How likely is the current alignment with regard to the text?
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
where
Uniform distribution (all distortions modelled by a are treated the same).
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
ai = j
Our channel model:
where
Learned parameter.
IBM Model 1
Source
Channel
Language Y
Language X
Language X
Language Y
Mr President , Noah's ark was filled not with production factors , but with living creatures .
Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe .
IBM Model 1 - Learning
Source
Channel
Language Y
Language X
Language X
Language Y
IBM Model 1 - Learning
Source
Channel
Language Y
Language X
Language X
Language Y
How do we estimate this?
IBM Model 1 - Learning
Source
Channel
Language Y
Language X
Language X
Language Y
The problem: we don’t know the alignments ahead of time. So we can’t apply MLE to find the parameter.
The solution: expectation maximization.
Neural Machine Translation (NMT)
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Slides from Abigail See
Final notes on NMT
Questions?