# Feb. 1, 2012
# Scribe: Dave Kush
Agenda:
1. Direction of Translation
2. Pseudo Code/ Recap
3. IBM Models 3 -5
4. NLTK
1. Direction of Translation
Application: f → e
-- naively assumed direction: p(e|f) # ← consitent with history, but confusing at outset
-- alternative:
noisy channel approach: # used by Collins
foreign speaker intends ENGLISH SENTENCE, which gets ‘corrupted’ into foreign signal
EN → F
p(e) ------- # we have these n-gram models
p(f|e)
this characterization allows us to compute things like:
argmaxe p(f|e)p(e)
2. Pseudo Code for Implementing MODEL 1, MODEL2.
# Model 1 implementation will be written in black. Additions for Model 2 will be in red.
ct(e|f) = 0 ∀ e,f
st(f) = 0 ∀ f
ca(i|j,le,lf) = 0 ∀ i,j,le,lf #i is foreign, j is english
sa(j,le,lf) = 0 ∀ i,j,le,lf
α(i|j,le,lf) = 1/ (1+ lf)le
for e→,f→ in corpus: # THIS IS THE E-STEP
le = len(e→)
lf = len(f→)
for j = 1 … le:
s_total[ ej ] = 0
for i = 0 … lf:
s_total[ ej ] += t(ej|fi) * α(i|j,le,lf)
for j = 1 … le:
for i = 0 … lf:
numerator = t(ej|fi) * α(i|j,le,lf)
denominator = s_total[ej ]
c = numerator/denominator
ct(ej|fi) += c
st(fi) += c
ca(i | j,le,lf) += c
sa(j,le,lf) += c
for all e,f: # THIS IS THE M-STEP
t(e|f) = ct(e|f) / st(f)
for all i,j,le,lf :
α(i|j,le,lf) = ca(i | j,le,lf) / sa(j,le,lf)
####### END OF PSEUDOCODE ! ##########################
3. IBM Models 3-5
3.1 IBM MODEL 3
Add-ins:
Fertility
NULL Insertion
Distortion
Fertility: the number of words fi translates into in ei
“zum” → “to the” n(2|”zum”) ≅1
“ja” → “” n(0|”ja”) ≅1
“haus” → “house” n(1|”haus”) ≅1
NULL Insertion: insert NULL between words in f→ w/ some p
po = NULL not added
p1 = Add NULL
Distortion: probability of switching order of ith, jth words
#NB: Different direction from alignment!
.
We rely on the language model to rule out strings like ‘the the’, ‘to to’, ‘the to’ which are potential outputs in lexical translation step due to fertility.
Notation:
Φ0 = No. of NULLs
Φi = No. of words fertilized by fi
# Every English words not generated by a word is generated by NULL.
Probability of a string of NULLs:
# SHOULD THERE BE A ‘!’ before n(....)?
Putting it together
Problem with Model 3:
Can’t use EM for it. Difficult to figure out which alignments to use/privilege.
3.2 IBM MODEL 4
Previous models don’t represent correlation between translated words and their positions.
Model 4 introduces the concept of a “cept” to handle this.
Consider the German sentence, and its translation:
Wenn Sie einsteigen möchten, drücken Sie die Taste
Push the button if you want to get on
CEPT:
words that are near each other in output were generally close in the input
Problem:
Need a “cept” for every fi in a given “translation position”
3.3 IBM MODEL 5
Earlier models allow assignment of positive probability mass to impossible translations.
This means some parts of the model don’t end up doing their job.
Model 5 fixed this with extra bookkeeping, using VACANCIES
############# END OF IBM MODELS #################
4. PHRASE-BASED SMT
Take the following alignment:
Here’s a problem:
“am” literally translates to “on the”
but in the translation above, we have “am” translated as “with the”
p(“with the” | “am”,”spass”) > p(“on the” | “am”,”spass”), but
p(“on the” | “am”) > p(“with the” | “am”)
Knowledge of noun preposition is paired with affects translation into English.
If we use phrases for translation, we get a way to incorporate this knowledge into our translation scheme.
= foreign phrase
# use capital ‘I’ for historical convention
With this formalization ,we can represent translation as such:
where,
= prob
= 1st word in phrase fi after translation
= position of last word fi-1 after translation
= distortion
d(x) = α|x|
Using example sentence, here are the d values for each phrase:
naturlich --- d = 0
hat --- d = 4 - 2 - 1 = 1 (4= position of “has”, 2=end of “of course”, etc.)
er --- d = 3 - 4 - 1 = -2
spass am --- d = 5 - 3 - 1 = 1
Spiel --- d = 8 - 7 - 1 = 0
How do we figure out ?
- look for frequent, short bigrams
- look for CONSISTENT bigrams across translation environments
- anything consistent constitutes a potential phrase from singletons to whole sentences
Given some alignment, you want a phrase consistent that’s consistent with it.
Consistency:
(e,f) is consistent with A ⇔
(1) ∀ ei ϵ e. : (ei,fj) ϵ A → fj ϵ f .
(2) ∀ fj ϵ f. : (ei,fj) ϵ A → ei ϵ e .
(3) ∃ ei ϵ e. , fj ϵ f .: (ei ,fj ) ϵ A
An example (presupposing an alignment provided by, say, Model 1 or 2)
michael | geht | davon | aus | , | dass | er | im | haus | bleibt | |
michael | X | |||||||||
assumes | X | X | X | |||||||
that | X | |||||||||
he | X | |||||||||
will | X | |||||||||
stay | X | |||||||||
in | X | |||||||||
the | X | |||||||||
house | X |
“geht davon aus” is aligned with “assumes”
what are phrases that we could discover?
contiguous phrases on either side of alignment are use-able
Impossible phrases: