Infinite Mixtures of Trees
Sergey Kirshner
AICML, Department of Computing Science, University of Alberta, Canada
Padhraic Smyth
Department of Computer Science,
University of California Irvine, USA
June 21, 2007
Generative Modeling of Multivariate Data
Infinite Mixtures of Trees, ICML 2007
2
June 21, 2007
Learning Multivariate Distributions
Infinite Mixtures of Trees, ICML 2007
3
June 21, 2007
x2
x3
x4
x5
x6
x1
x2
x3
x4
x5
x6
x1
x2
x3
x4
x5
x6
x1
x2
x3
x4
x5
x6
x1
Learning from Data
Infinite Mixtures of Trees, ICML 2007
4
June 21, 2007
Contributions
Infinite Mixtures of Trees, ICML 2007
5
June 21, 2007
Alternative View
Infinite Mixtures of Trees, ICML 2007
6
June 21, 2007
Tree-Structured Distributions
Infinite Mixtures of Trees, ICML 2007
7
June 21, 2007
x2
x3
x4
x5
x6
x1
x2
x3
x4
x5
x6
x1
θ1
θ2
θ4
θ3
θ5
θ6
θ26
θ14
θ24
θ34
θ35
Chow-Liu Tree Estimation
Infinite Mixtures of Trees, ICML 2007
8
June 21, 2007
[Chow and Liu 68]
Mixtures of Trees
Infinite Mixtures of Trees, ICML 2007
9
June 21, 2007
x
S
x2
x4
x3
x2
x4
x3
x2
x4
x3
S
S=1
S=2
S=3
=
x1
x1
x1
[Meilă and Jordan 00; Grim 84]
Generative View of Finite Mixtures
Infinite Mixtures of Trees, ICML 2007
10
June 21, 2007
x2
x4
x3
x1
x2
x4
x3
x1
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
θ1,θ2,θ3,θ4
θ13,θ23,θ34
θ1,θ2,θ3,θ4
θ13,θ14,θ24
1011
0101
1010
0011
0000
Why Bayesian Non-parametric Approach?
Infinite Mixtures of Trees, ICML 2007
11
June 21, 2007
1011
0101
1010
0011
0000
Generative View of Finite Mixtures
Infinite Mixtures of Trees, ICML 2007
12
June 21, 2007
x2
x4
x3
x1
x2
x4
x3
x1
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
θ1,θ2,θ3,θ4
θ13,θ23,θ34
θ1,θ2,θ3,θ4
θ13,θ14,θ24
Distribution over Spanning Trees
Infinite Mixtures of Trees, ICML 2007
13
June 21, 2007
x1
x4
x2
x3
β34
β23
β14
β12
β13
β24
x1
x4
x2
x3
β34
β23
β14
β12
β13
β24
x1
x4
x2
x3
β34
β23
β14
β12
β13
β24
[Meilă and Jaakkola 00, 06]
Generating Random Mazes
Infinite Mixtures of Trees, ICML 2007
14
June 21, 2007
[Propp and Wilson 98]
Silver Stater from Knossos, Crete. c. 425 BCE, picture from
http://www.uark.edu/campus-resources/dlevine/PeaceBibliog.html
Sampling of Spanning Trees
Infinite Mixtures of Trees, ICML 2007
15
June 21, 2007
Broder-Aldous Algorithm
Infinite Mixtures of Trees, ICML 2007
16
June 21, 2007
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
x1
x4
x2
x3
Distribution over the Parameters
Infinite Mixtures of Trees, ICML 2007
17
June 21, 2007
x1
x4
x2
x3
N’34
N’23
N’14
N’12
N’13
N’24
N’1
N’2
N’3
N’4
x1
x4
x2
x3
N’34
N’23
N’14
N’12
N’13
N’24
N’1
N’2
N’3
N’4
[Meilă and Jaakkola 00, 06]
Ψ
Bayesian Finite Mixture of Trees
Infinite Mixtures of Trees, ICML 2007
18
June 21, 2007
N
K
α
β
Ψ
π
θiT
θn
xn
x2
x4
x3
x1
x2
x4
x3
x1
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
θ1,θ2,θ3,θ4
θ13,θ23,θ34
θ1,θ2,θ3,θ4
θ13,θ14,θ24
π
1011
0101
1010
0011
0000
Estimating Bayesian Finite Mixtures
Infinite Mixtures of Trees, ICML 2007
19
June 21, 2007
Sampling Bayesian Finite Mixtures of Trees
Infinite Mixtures of Trees, ICML 2007
20
June 21, 2007
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ23,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ14,θ24
π
1011
0101
1010
0011
0000
N
K
α
β
Ψ
π
θiT
θn
xn
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ13,θ24
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ23,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ23,θ24
Sampling Without π
Infinite Mixtures of Trees, ICML 2007
21
June 21, 2007
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ23,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ14,θ24
1011
0101
1010
0011
0000
N
K
α
β
Ψ
π
θiT
θn
xn
Dirichlet Process Mixture of Trees
Infinite Mixtures of Trees, ICML 2007
22
June 21, 2007
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ23,θ34
N
α
β
Ψ
π
θiT
θn
xn
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ14,θ24
1011
0101
1010
0011
0000
∞
a
b
Collapsed Gibbs Sampling for DPMT
Infinite Mixtures of Trees, ICML 2007
23
June 21, 2007
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ23,θ34
N
α
β
Ψ
π
θiT
θn
xn
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ13,θ14,θ24
1011
0101
1010
0011
0000
∞
a
b
x2
x4
x3
x1
θ1,θ2,θ3,θ4
θ12,θ24,θ34
Simulated data
Infinite Mixtures of Trees, ICML 2007
24
June 21, 2007
Northeastern Brazil
Infinite Mixtures of Trees, ICML 2007
25
June 21, 2007
Northeast Brazil
1975-2002 (except 1976, 78, 84, and 86)
24 seasons
90 days
10 stations
Number of Components (Brazil)
Infinite Mixtures of Trees, ICML 2007
26
June 21, 2007
Most Prominent Components (Brazil)
Infinite Mixtures of Trees, ICML 2007
27
June 21, 2007
Cross-validated Log-likelihood (Brazil)
Infinite Mixtures of Trees, ICML 2007
28
June 21, 2007
Queensland (Northeastern Australia)
Infinite Mixtures of Trees, ICML 2007
29
June 21, 2007
1958-1998
October-April
40 seasons
197 days
11 stations
Number of Components (Queensland)
Infinite Mixtures of Trees, ICML 2007
30
June 21, 2007
Cross-validated Log-likelihood (Queensland)
Infinite Mixtures of Trees, ICML 2007
31
June 21, 2007
Southwestern Australia
Infinite Mixtures of Trees, ICML 2007
32
June 21, 2007
1978-1992
May-October
15 seasons
184 days
30 stations
Cross-validated Log-likelihood (SW Australia)
Infinite Mixtures of Trees, ICML 2007
33
June 21, 2007
Summary
Infinite Mixtures of Trees, ICML 2007
34
June 21, 2007
Future
Infinite Mixtures of Trees, ICML 2007
35
June 21, 2007
Software
Infinite Mixtures of Trees, ICML 2007
36
June 21, 2007
MultiVariate Non-homogeneous Hidden Markov Models (MVNHMM) Toolbox
http://www.cs.ualberta.ca/~sergey/MVNHMM
Acknowledgements
Infinite Mixtures of Trees, ICML 2007
37
June 21, 2007