NGramLinearInterpolationEM
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
View only
 
 
Still loading...
ABCDEFGHIJK
1
Based on example from Chris Manning.
2
3
We've trained 1-, 2-, and 3-gram language models on a large training corpus.
4
We wish to find good interpolation weights for a mixture of the three models.
5
We'll use a development corpus (separate from the training corpus) to optimize the weights.
6
There are many possible approaches to this optimization problem.
7
This spreadsheet illustrates the use of the EM algorithm for this purpose.
8
9
We'll focus solely on tokens occurring in the context "comes across ______".
10
In the training corpus, this context was observed 10 times.
11
Of those 10 occurrences, 8 were followed by "as", 1 by "the", and 1 by "three".
12
In the development corpus, this context was observed 5 times.
13
Of those 5 occurrences, 3 were followed by "as", 1 by "to", and 1 by "a".
14
15
The individual n-gram models have been trained using MLE on the training corpus.
16
We take their parameters as fixed.
17
We will vary only the interpolation weights.
18
19
We are essentially postulating a generative story which goes like this:
20
Every event is a pair (x, y), where x is observed and y is hidden.
21
First, we select an n-gram model y ∈ {1, 2, 3} according to P(y).
22
Then, we select a word x according to P(x | y).
23
The conditional distributions P(x | y) are the n-gram models. These parameters are fixed.
24
The marginal distribution P(y) represents the mixture weights to be learned via EM.
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
Loading...
 
 
 
About
EM
Summary
LogL graph
Things to try