Published using Google Docs
Compling II - Scribe Notes 2
Updated automatically every 5 minutes

# Feb. 1, 2012

# Scribe: Dave Kush

Agenda:

1. Direction of Translation

2. Pseudo Code/ Recap

3.  IBM Models 3 -5

4. NLTK

1. Direction of Translation

Application: f → e

        -- naively assumed direction: p(e|f)     # ← consitent with history, but confusing at outset

        -- alternative:

                noisy channel approach:         #  used by Collins

                foreign speaker intends ENGLISH SENTENCE, which gets ‘corrupted’ into foreign signal

                EN → F

p(e) ------- # we have these n-gram models

p(f|e)

this characterization allows us to compute things like:

argmaxe p(f|e)p(e) 

        

2. Pseudo Code for Implementing MODEL 1, MODEL2.

# Model 1 implementation will be written in black. Additions for Model 2 will be in red.

         

        ct(e|f) = 0                         e,f

        st(f) = 0                          f

        ca(i|j,le,lf) = 0                  i,j,le,lf  #i is foreign, j is english

        sa(j,le,lf) = 0                  i,j,le,lf

        α(i|j,le,lf) = 1/ (1+ lf)le

        for e,fin corpus:                                # THIS IS THE E-STEP

le  = len(e)

lf  = len(f)

for j = 1 … le:

        s_total[ ej ] = 0

        

for i = 0 … lf:

s_total[ ej ] += t(ej|fi) * α(i|j,le,lf)     

for j = 1 … le:

        for i = 0 … lf:

                numerator = t(ej|fi) * α(i|j,le,lf) 

                 denominator  =  s_total[ej ]

                c = numerator/denominator

                ct(ej|fi) += c

                st(fi) += c

                ca(i | j,le,lf) += c

                sa(j,le,lf) += c

for all e,f:                                        # THIS IS THE M-STEP

        t(e|f) = ct(e|f) / st(f)

        for all i,j,le,lf :

        α(i|j,le,lf)  =  ca(i | j,le,lf) / sa(j,le,lf)

####### END OF PSEUDOCODE ! ##########################

3. IBM Models 3-5

3.1 IBM MODEL 3

        Add-ins:

                Fertility

                NULL Insertion

                Distortion

        Fertility:   the number of words fi translates into in ei

                “zum” → “to the”               n(2|”zum”)         ≅1

                “ja” → “”                       n(0|”ja”)         ≅1

                “haus” → “house”        n(1|”haus”) ≅1

        NULL Insertion: insert NULL between words in f w/ some p

po = NULL not added

p1 = Add NULL

Distortion:        probability of switching order of ith, jth words

        

                                        #NB: Different direction from alignment!

 

.

We rely on the language model to rule out strings like ‘the the’, ‘to to’, ‘the to’ which are potential outputs in lexical translation step due to fertility.

Notation:

Φ0 = No. of NULLs         

        Φi = No. of words fertilized by fi

   # Every English words not generated by a word is generated by NULL.

Probability of a string of NULLs:

                # SHOULD THERE BE A ‘!’ before n(....)?

Putting it together

Problem with Model 3:

        Can’t use EM for it. Difficult to figure out which alignments to use/privilege.

3.2 IBM MODEL 4

Previous models don’t represent correlation between translated words and their positions.

Model 4 introduces the concept of a “cept” to handle this.

Consider the German sentence, and its translation:

Wenn Sie einsteigen möchten, drücken Sie die Taste

Push the button if you want to get on

CEPT:  

words that are near each other in output were generally close in the input

Problem:

        Need a “cept” for every fi in a given “translation position”

3.3 IBM MODEL 5

Earlier models allow assignment of positive probability mass to impossible translations.

This means some parts of the model don’t end up doing their job.

Model 5 fixed this with extra bookkeeping, using VACANCIES

############# END OF IBM MODELS #################

4. PHRASE-BASED SMT

Take the following alignment:

Here’s a problem:

“am” literally translates to “on the”

but in the translation above, we have “am” translated as “with the”

p(“with the” | “am”,”spass”) > p(“on the” | “am”,”spass”), but

p(“on the” | “am”) > p(“with the” | “am”)

Knowledge of noun preposition is paired with affects translation into English.

If we use phrases for translation, we get a way to incorporate this knowledge into our translation scheme.

 = foreign phrase

  # use capital ‘I’ for historical convention

With this formalization ,we can represent translation as such:

where,

 = prob

 = 1st word in phrase fi after translation

 = position of last word fi-1 after translation

   = distortion

d(x) = α|x|

Using example sentence, here are the d values for each phrase:

naturlich         ---         d = 0                

hat                 ---        d = 4 - 2 - 1 = 1        (4= position of “has”, 2=end of “of course”, etc.)

er                ---        d = 3 - 4 - 1 = -2

spass am        ---        d = 5 - 3 - 1 = 1

Spiel                ---        d = 8 - 7 - 1 = 0

How do we figure out ?

        - look for frequent, short bigrams

        - look for CONSISTENT bigrams across translation environments

           - anything consistent constitutes a potential phrase from singletons to whole sentences

        

Given some alignment, you want a phrase consistent that’s consistent with it.

Consistency:

        (e,f) is consistent with A ⇔

                (1)  ∀ ei ϵ e. : (ei,fj) ϵ A → fj ϵ f .

                (2) ∀ fj ϵ f. : (ei,fj) ϵ A → ei ϵ e .

                (3) ∃ ei ϵ e. , fj ϵ f .: (ei ,fj ) ϵ A

An example (presupposing an alignment provided by, say, Model 1 or 2)

michael

geht

davon

aus

,

dass

er

im

haus

bleibt

michael

X

assumes

X

X

X

that

X

he

X

will

X

stay

X

in

X

the

X

house

X

“geht davon aus” is aligned with “assumes”

what are phrases that we could discover?

contiguous phrases on either side of alignment are use-able

Impossible phrases: