Compling II - Scribe Notes 2

# Feb. 1, 2012

# Scribe: Dave Kush

Agenda:

1. Direction of Translation

2. Pseudo Code/ Recap

3. IBM Models 3 -5

4. NLTK

1. Direction of Translation

Application: f → e

-- naively assumed direction: p(e|f) # ← consitent with history, but confusing at outset

-- alternative:

noisy channel approach: # used by Collins

foreign speaker intends ENGLISH SENTENCE, which gets ‘corrupted’ into foreign signal

EN → F

p(e) ------- # we have these n-gram models

p(f|e)

this characterization allows us to compute things like:

argmaxe p(f|e)p(e)

2. Pseudo Code for Implementing MODEL 1, MODEL2.

# Model 1 implementation will be written in black. Additions for Model 2 will be in red.

Input {e→ } {f→}
while not converged: # (in practice we’ll specify no. iterations)

ct(e|f) = 0 ∀ e,f

st(f) = 0 ∀ f

ca(i|j,le,lf) = 0 ∀ i,j,le,lf #i is foreign, j is english

sa(j,le,lf) = 0 ∀ i,j,le,lf

α(i|j,le,lf) = 1/ (1+ lf)le

for e→,f→ in corpus: # THIS IS THE E-STEP

le = len(e→)

lf = len(f→)

for j = 1 … le:

s_total[ ej ] = 0

for i = 0 … lf:

s_total[ ej ] += t(ej|fi) * α(i|j,le,lf)

for j = 1 … le:

for i = 0 … lf:

numerator = t(ej|fi) * α(i|j,le,lf)

denominator = s_total[ej ]

c = numerator/denominator

ct(ej|fi) += c

st(fi) += c

ca(i | j,le,lf) += c

sa(j,le,lf) += c

for all e,f: # THIS IS THE M-STEP

t(e|f) = ct(e|f) / st(f)

for all i,j,le,lf :

α(i|j,le,lf) = ca(i | j,le,lf) / sa(j,le,lf)

####### END OF PSEUDOCODE ! ##########################

3. IBM Models 3-5

3.1 IBM MODEL 3

Add-ins:

Fertility

NULL Insertion

Distortion

Fertility: the number of words fi translates into in ei

“zum” → “to the” n(2|”zum”) ≅1

“ja” → “” n(0|”ja”) ≅1

“haus” → “house” n(1|”haus”) ≅1

NULL Insertion: insert NULL between words in f→ w/ some p

po = NULL not added

p1 = Add NULL

Distortion: probability of switching order of ith, jth words

#NB: Different direction from alignment!

We rely on the language model to rule out strings like ‘the the’, ‘to to’, ‘the to’ which are potential outputs in lexical translation step due to fertility.

Notation:

Φ0 = No. of NULLs

Φi = No. of words fertilized by fi

# Every English words not generated by a word is generated by NULL.

Probability of a string of NULLs:

# SHOULD THERE BE A ‘!’ before n(....)?

Putting it together

Problem with Model 3:

Can’t use EM for it. Difficult to figure out which alignments to use/privilege.

3.2 IBM MODEL 4

Previous models don’t represent correlation between translated words and their positions.

Model 4 introduces the concept of a “cept” to handle this.

Consider the German sentence, and its translation:

Wenn Sie einsteigen möchten, drücken Sie die Taste

Push the button if you want to get on

CEPT:

words that are near each other in output were generally close in the input

Problem:

Need a “cept” for every fi in a given “translation position”

3.3 IBM MODEL 5

Earlier models allow assignment of positive probability mass to impossible translations.

This means some parts of the model don’t end up doing their job.

Model 5 fixed this with extra bookkeeping, using VACANCIES

############# END OF IBM MODELS #################

4. PHRASE-BASED SMT

Take the following alignment:

Here’s a problem:

“am” literally translates to “on the”

but in the translation above, we have “am” translated as “with the”

p(“with the” | “am”,”spass”) > p(“on the” | “am”,”spass”), but

p(“on the” | “am”) > p(“with the” | “am”)

Knowledge of noun preposition is paired with affects translation into English.

If we use phrases for translation, we get a way to incorporate this knowledge into our translation scheme.

= foreign phrase

# use capital ‘I’ for historical convention

With this formalization ,we can represent translation as such:

where,

= prob

= 1st word in phrase fi after translation

= position of last word fi-1 after translation

= distortion

d(x) = α|x|

Using example sentence, here are the d values for each phrase:

naturlich --- d = 0

hat --- d = 4 - 2 - 1 = 1 (4= position of “has”, 2=end of “of course”, etc.)

er --- d = 3 - 4 - 1 = -2

spass am --- d = 5 - 3 - 1 = 1

Spiel --- d = 8 - 7 - 1 = 0

How do we figure out ?

- look for frequent, short bigrams

- look for CONSISTENT bigrams across translation environments

- anything consistent constitutes a potential phrase from singletons to whole sentences

Given some alignment, you want a phrase consistent that’s consistent with it.

Consistency:

(e,f) is consistent with A ⇔

(1) ∀ ei ϵ e. : (ei,fj) ϵ A → fj ϵ f .

(2) ∀ fj ϵ f. : (ei,fj) ϵ A → ei ϵ e .

(3) ∃ ei ϵ e. , fj ϵ f .: (ei ,fj ) ϵ A

An example (presupposing an alignment provided by, say, Model 1 or 2)

	michael	geht	davon	aus	,	dass	er	im	haus	bleibt
michael	X
assumes		X	X	X
that						X
he							X
will										X
stay										X
in								X
the								X
house									X

“geht davon aus” is aligned with “assumes”

what are phrases that we could discover?

contiguous phrases on either side of alignment are use-able

all singletons
“geht davon aus, dass” → “assumes that”
“Michael geht davon aus, dass er” → “Michael assumes that he”
“im haus bleibt” → “will stay in the house”

Impossible phrases:

“michael assumes” ↛ “michael geht”, because CONDITION 1 of Consistency Fails