1 of 37

Infinite Mixtures of Trees

Sergey Kirshner

AICML, Department of Computing Science, University of Alberta, Canada

Padhraic Smyth

Department of Computer Science,

University of California Irvine, USA

June 21, 2007

2 of 37

Generative Modeling of Multivariate Data

Infinite Mixtures of Trees, ICML 2007

2

June 21, 2007

3 of 37

Learning Multivariate Distributions

Infinite Mixtures of Trees, ICML 2007

3

June 21, 2007

x2

x3

x4

x5

x6

x1

x2

x3

x4

x5

x6

x1

x2

x3

x4

x5

x6

x1

x2

x3

x4

x5

x6

x1

4 of 37

Learning from Data

  • Point estimate approach:
    • Maximize

  • Bayesian approach:
    • Integrate
    • Sample
    • Average

Infinite Mixtures of Trees, ICML 2007

4

June 21, 2007

5 of 37

Contributions

  • Bayesian generalization of mixtures of trees for complete data [Meilă and Jordan 00; Grim 84]
    • Combining mixtures of trees with Dirichlet processes
    • Using conjugate prior over tree-structured distributions
  • Effective
    • Removes the need to select the number of mixture components (model selection)
    • Provides a posterior distribution over the number of mixture components
    • Outperforms point estimate solutions by averaging over the unknown parameters and dependence structure
  • Efficient
    • Resampling time complexity proportional to data set size, dimensionality, and the number of mixture components

Infinite Mixtures of Trees, ICML 2007

5

June 21, 2007

6 of 37

Alternative View

  • Two types of variables to model
    • Dependence structure (categorical)
    • Parameters over cliques (real-valued)
  • Fully Bayesian modeling of multivariate categorical data
    • Structures
      • No good flexible priors over arbitrary graph structures
    • Parameters
      • Dirichlet for Bayesian networks
      • No equivalent for MRFs
  • Mixtures of trees as an alternative to BNs or MRFs for Bayesian modeling of multivariate data

Infinite Mixtures of Trees, ICML 2007

6

June 21, 2007

7 of 37

Tree-Structured Distributions

Infinite Mixtures of Trees, ICML 2007

7

June 21, 2007

x2

x3

x4

x5

x6

x1

x2

x3

x4

x5

x6

x1

θ1

θ2

θ4

θ3

θ5

θ6

θ26

θ14

θ24

θ34

θ35

8 of 37

Chow-Liu Tree Estimation

  • Maximizing log-likelihood solving maximum spanning tree (MST) problem
    • Can find both the tree structure and the parameters in one swoop!

    • Finding MST is quadratic in the number of nodes [Kruskal 59]

    • Edge weights are pairwise mutual information values – measure of conditional independence

    • O(Nd2)

Infinite Mixtures of Trees, ICML 2007

8

June 21, 2007

[Chow and Liu 68]

9 of 37

Mixtures of Trees

  • Parameters learned with EM
    • O(NKd2) per iteration
  • Effective for many problems

Infinite Mixtures of Trees, ICML 2007

9

June 21, 2007

x

S

x2

x4

x3

x2

x4

x3

x2

x4

x3

S

S=1

S=2

S=3

=

x1

x1

x1

[Meilă and Jordan 00; Grim 84]

10 of 37

Generative View of Finite Mixtures

Infinite Mixtures of Trees, ICML 2007

10

June 21, 2007

x2

x4

x3

x1

x2

x4

x3

x1

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

θ1,θ2,θ3,θ4

θ13,θ23,θ34

θ1,θ2,θ3,θ4

θ13,θ14,θ24

1011

0101

1010

0011

0000

11 of 37

Why Bayesian Non-parametric Approach?

  • Not enough samples to estimate component parameters

  • How many components to use?
    • Model selection

  • Mixtures of trees good at approximating
    • Let the number of parameters grow with the amount of data

Infinite Mixtures of Trees, ICML 2007

11

June 21, 2007

1011

0101

1010

0011

0000

12 of 37

Generative View of Finite Mixtures

Infinite Mixtures of Trees, ICML 2007

12

June 21, 2007

x2

x4

x3

x1

x2

x4

x3

x1

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

θ1,θ2,θ3,θ4

θ13,θ23,θ34

θ1,θ2,θ3,θ4

θ13,θ14,θ24

13 of 37

Distribution over Spanning Trees

Infinite Mixtures of Trees, ICML 2007

13

June 21, 2007

x1

x4

x2

x3

β34

β23

β14

β12

β13

β24

x1

x4

x2

x3

β34

β23

β14

β12

β13

β24

x1

x4

x2

x3

β34

β23

β14

β12

β13

β24

[Meilă and Jaakkola 00, 06]

14 of 37

Generating Random Mazes

Infinite Mixtures of Trees, ICML 2007

14

June 21, 2007

[Propp and Wilson 98]

Silver Stater from Knossos, Crete. c. 425 BCE, picture from

http://www.uark.edu/campus-resources/dlevine/PeaceBibliog.html

15 of 37

Sampling of Spanning Trees

  • Deterministic algorithm proportional to product of two d×d matrices [Colbourn et al 96]
  • Random walk on vertices (O(d*log(d)) [Broder 89; Aldous 90]
  • Loop erased walk on vertices (O(d*log(d)) [Propp and Wilson 98]
  • For arbitrary weights need O(d2) to compute transitions

Infinite Mixtures of Trees, ICML 2007

15

June 21, 2007

16 of 37

Broder-Aldous Algorithm

Infinite Mixtures of Trees, ICML 2007

16

June 21, 2007

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

x1

x4

x2

x3

17 of 37

Distribution over the Parameters

Infinite Mixtures of Trees, ICML 2007

17

June 21, 2007

x1

x4

x2

x3

N’34

N’23

N’14

N’12

N’13

N’24

N’1

N’2

N’3

N’4

x1

x4

x2

x3

N’34

N’23

N’14

N’12

N’13

N’24

N’1

N’2

N’3

N’4

[Meilă and Jaakkola 00, 06]

Ψ

18 of 37

Bayesian Finite Mixture of Trees

Infinite Mixtures of Trees, ICML 2007

18

June 21, 2007

N

K

α

β

Ψ

π

θiT

θn

xn

x2

x4

x3

x1

x2

x4

x3

x1

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

θ1,θ2,θ3,θ4

θ13,θ23,θ34

θ1,θ2,θ3,θ4

θ13,θ14,θ24

π

1011

0101

1010

0011

0000

19 of 37

Estimating Bayesian Finite Mixtures

  • Cannot compute posterior in closed form
  • Sampling
    • Gibbs (MCMC)

Infinite Mixtures of Trees, ICML 2007

19

June 21, 2007

20 of 37

Sampling Bayesian Finite Mixtures of Trees

Infinite Mixtures of Trees, ICML 2007

20

June 21, 2007

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ23,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ14,θ24

π

1011

0101

1010

0011

0000

N

K

α

β

Ψ

π

θiT

θn

xn

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ13,θ24

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ23,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ23,θ24

21 of 37

Sampling Without π

Infinite Mixtures of Trees, ICML 2007

21

June 21, 2007

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ23,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ14,θ24

1011

0101

1010

0011

0000

N

K

α

β

Ψ

π

θiT

θn

xn

22 of 37

Dirichlet Process Mixture of Trees

Infinite Mixtures of Trees, ICML 2007

22

June 21, 2007

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ23,θ34

N

α

β

Ψ

π

θiT

θn

xn

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ14,θ24

1011

0101

1010

0011

0000

a

b

23 of 37

Collapsed Gibbs Sampling for DPMT

Infinite Mixtures of Trees, ICML 2007

23

June 21, 2007

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ23,θ34

N

α

β

Ψ

π

θiT

θn

xn

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ13,θ14,θ24

1011

0101

1010

0011

0000

a

b

x2

x4

x3

x1

θ1,θ2,θ3,θ4

θ12,θ24,θ34

24 of 37

Simulated data

  • 6-state mixture of trees over 30-bit vectors

  • 10 models: 100, 200, 500, 1000, 2000, 5000 vectors training set, 10000 vectors test set (each)

Infinite Mixtures of Trees, ICML 2007

24

June 21, 2007

25 of 37

Northeastern Brazil

Infinite Mixtures of Trees, ICML 2007

25

June 21, 2007

Northeast Brazil

1975-2002 (except 1976, 78, 84, and 86)

24 seasons

90 days

10 stations

26 of 37

Number of Components (Brazil)

Infinite Mixtures of Trees, ICML 2007

26

June 21, 2007

27 of 37

Most Prominent Components (Brazil)

Infinite Mixtures of Trees, ICML 2007

27

June 21, 2007

28 of 37

Cross-validated Log-likelihood (Brazil)

Infinite Mixtures of Trees, ICML 2007

28

June 21, 2007

29 of 37

Queensland (Northeastern Australia)

Infinite Mixtures of Trees, ICML 2007

29

June 21, 2007

1958-1998

October-April

40 seasons

197 days

11 stations

30 of 37

Number of Components (Queensland)

Infinite Mixtures of Trees, ICML 2007

30

June 21, 2007

31 of 37

Cross-validated Log-likelihood (Queensland)

Infinite Mixtures of Trees, ICML 2007

31

June 21, 2007

32 of 37

Southwestern Australia

Infinite Mixtures of Trees, ICML 2007

32

June 21, 2007

1978-1992

May-October

15 seasons

184 days

30 stations

33 of 37

Cross-validated Log-likelihood (SW Australia)

Infinite Mixtures of Trees, ICML 2007

33

June 21, 2007

34 of 37

Summary

  • Bayesian extension to mixture of trees
  • Can obtain posterior over number of components
    • Or integrate parameters AND the number of mixture components
  • Outperforms point estimates for the correct model
  • Outperforms mixtures of trees on real-world data sets by averaging out parameters and number of mixture components

Infinite Mixtures of Trees, ICML 2007

34

June 21, 2007

35 of 37

Future

  • Extend DP to hierarchical DP (infinite-state extension to HMM)
  • Can potentially handle missing data
  • Need new (better?) priors for structures and parameters of graphical models
  • Extensions to real-valued distributions
    • Gaussian tree-structured distributions

Infinite Mixtures of Trees, ICML 2007

35

June 21, 2007

36 of 37

Software

Infinite Mixtures of Trees, ICML 2007

36

June 21, 2007

MultiVariate Non-homogeneous Hidden Markov Models (MVNHMM) Toolbox

http://www.cs.ualberta.ca/~sergey/MVNHMM

37 of 37

Acknowledgements

  • Andrew W. Robertson (IRI)
  • Stephen P. Charles (CSIRO)
  • Russell Greiner (U of A)
  • Dale Schuurmans (U of A)

Infinite Mixtures of Trees, ICML 2007

37

June 21, 2007