PGMs - 1
Overview
@AishFenton @NetflixResearch
Why PGMs?
Example / Motivation
Example / Motivation
Example / Motivation
Layout of tutorial
Probability Reminders
Basic Identities
Sum rule
Product rule
Or the otherway around
| Tues | Wed | Thur | |
🍏 | 0.2 | 0.1 | 0.1 | 0.4 |
🍗 | 0.1 | 0.1 | 0.1 | 0.3 |
🧀 | 0.1 | 0.1 | 0.1 | 0.3 |
| 0.4 | 0.3 | 0.3 | |
| Tues | Wed | Thur | |
🍏 | 0.50 | 0.33 | 0.33 | 0.4 |
🍗 | 0.25 | 0.33 | 0.33 | 0.3 |
🧀 | 0.25 | 0.33 | 0.33 | 0.3 |
| 0.4 | 0.3 | 0.3 | |
Bayes Rule
Want this
But we have this
| Tues | Wed | Thur | |
🍏 | 0.50 | 0.33 | 0.33 | 0.4 |
🍗 | 0.25 | 0.33 | 0.33 | 0.3 |
🧀 | 0.25 | 0.33 | 0.33 | 0.3 |
| 0.4 | 0.3 | 0.3 | |
Useful for swapping from: p(obs|event) -> p(event|obs)
Entropy
Information content of event X
More uncertainty = more information
Kullback–Leibler Divergence
NB: Def isn’t symmetric
“Divergence” of Q from P
Kullback–Leibler Divergence
Distributions we’ll need
Bernoulli
x ∈ {0,1}
Multinomial (1 draw)
1 of K encoding
A
B
C
p([0,1,0])=
0.7
0.3
0.2
Multinomial (n draws)
N! Ways to draw
Account for repeats
Smooth factorial
Beta / Dirichlet Distributions
Beta
Dirichlet
failures
successes
k > 2 states
Bernoulli-Beta Conjugacy
Bernoulli
Beta
= Beta(α+1, β)
Multinomial-Dirichlet Conjugacy
Muli
Dir
= Dir([α1+x1,...,αi+xi])
Bayes Nets
Directed Graphical Models
A
B
C
D
A
B
C
D
A
B
C
D
Directed graphical models
Plate Notation
C
P
2
A
α
Random variable
Observed R.V.
Repeat K times
D
B
C
β
Plate Notation
(Non-standard, but useful)
C
P
2
A
α
Fixed param
D
B
Output of this is a “switch”
C
β
Plate Notation
C
P
2
A
α
D
B
Special case: Used to index into D
C
Draw of this R.V becomes param of next R.V
What’s missing?
β
Generative Story
for k = 1 to 2
Dk ~ Dir(β)
foreach ML conference, i:
Ai ~ Beta(α)
foreach paper, j:
Bij ~ Bern(Ai)
Cij ~ Mult(DB)
// Ak & WK
// topic list
// bias!
// index D
// draw topic
What we observe...
C
P
2
A
α
D
B
C
β
Observe:
What we observe...
C
P
2
A
α
D
B
C
β
Can infer:
Conditional Independence
A ⫫ C | ∅
🌧
☂️
📺
A
C
B
A ⫫ C | B
🌧
☂️
📺
A
C
B
A ⫫ C | B
🌬
⛄
❄️
A
C
B
A ⫫ C | ∅
💦
🌧
🚿
A
C
B
A ⫫ C | B
💦
🌧
🚿
Explained away
A
C
B
Markov Blanket
xi
Because
explained
problem
Bayesian Modeling
Maximum Likelihood (MLE)
Optimize params to best fit data
Example: MLE of Multinomial
Must sum to 1
Lagrange multiplier
But…
Going Bayesian…
Apply bayes rule
Maybe a complex distribution now
Now a full distribution over θ
Anatomy of a model
Posterior
Likelihood
Prior
Ouch !
Conjugacy revisited
Multi
Beta
We know this. Why?
For example:
Pseudo counts
= Dir([α1+x1,...,αi+xi])
Why go Bayesian?
Posterior Predictive Distribution
We have this one now
New data
Marginalize over θ
Monte Carlo Estimate
Average over samples from Posterior
Modeling tips
Markov Random Fields
(Undirected Graphical Models)