CS365�Foundations of Data Science
Lecture 9 (2/17)
Charalampos E. Tsourakakis �ctsourak@bu.edu
MLE vs MAP
In some cases solving analytically for θ is hard. ��Today’s agenda: EM algorithm, an iterative approach to solving hard parameter learning problems.
Two coin problem
Process
We repeat the process k times
Two coin problem
Two coin problem
Comprehension question: What are the MLE for θΑ, θΒ in this case?
Two coin problem
Maximum likelihood estimates for the unknown parameters θΑ, θΒ
Two coin problem
The Expectation-Maximization algorithm
Let’s see the key idea in the context of the two-coins problem before full description
Two coin problem
Two coin problem
Suppose our initial guess for the unknown variables are 0.6, and 0.5. ��E-step: what is the probability that round i comes from coin A/coin B?
Why?
Two coin problem
M-step: In order to learn we first need to estimate ��the number of heads/tails from coins A/B given our estimates of the ��latent variables. �
Two coin problem
| Coin A | Coin B |
Heads | 0.45x5 | 0.55x5 |
Tails | 0.45x5 | 0.55x5 |
Round 1: 5H, 5T
We repeat this for all five rounds
One more round
| Coin A | Coin B |
Heads | 0.8x9 | 0.2x9 |
Tails | 0.8x1 | 0.2x1 |
Round 2: 9H, 1T �From the E-round we have
Two coin problem
EM algorithm
EM algorithm
Define the “expected log” Q(θ|θ᾽) where θ’ is the current estimate of θ: ����The EM algorithm is an iterative method consisting of two steps.��1. E-step: Find Q(θ|θ᾽) in terms of the latent variables z
2. M-step: Find θ* maximizing Q(θ|θ᾽)
EM-algorithm
Some remarks��1. The quantity Q typically is not computed, but is useful for proving the convergence of the EM algorithm. ��2. What we really need to know is the distribution of z given x, our guess θ’, and frequently knowing the expectation of z over its distribution suffices as we saw in our example.
3. The M-step maximizes Q over all possible θ values. Ideally, knowing x and z allows an easy maximization over θ.