1 of 41

Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series

Sergey Kirshner, UC Irvine

Padhraic Smyth, UC Irvine

Andrew Robertson, IRI

July 10, 2004

UAI-2004 © Sergey Kirshner, UC Irvine

2 of 41

Overview

  • Data and its modeling aspects
  • Model description
    • General Approach
      • Hidden Markov models
    • Capturing data properties
      • Chow-Liu trees
      • Conditional Chow-Liu trees
  • Inference and Learning
  • Experimental Results
  • Summary and Future Extensions

2

UAI-2004 © Sergey Kirshner, UC Irvine

3 of 41

Snapshot of the Data

3

1

2

3

4

5

1

2

3

4

5

6

7

8

T

N

UAI-2004 © Sergey Kirshner, UC Irvine

4 of 41

Data Aspects

  • Correlation
    • Spatial dependence

  • Temporal structure
    • First order dependence

  • Variability of individual series
    • Interannual variability

4

UAI-2004 © Sergey Kirshner, UC Irvine

5 of 41

Modeling Precipitation Occurrence

5

Southwestern Australia, 1978-92

Western US, 1952-90

UAI-2004 © Sergey Kirshner, UC Irvine

6 of 41

A Bit of Notation

  • Vector time series R
    • R1:T=R1,..,RT
  • Vector observation of R at time t
    • Rt=(At,Bt,…,Zt)

6

A1

B1

Z1

C1

R1

A2

B2

Z2

C2

R2

AT

BT

ZT

CT

RT

UAI-2004 © Sergey Kirshner, UC Irvine

7 of 41

Weather Generator

7

R1

R2

RT

A1

B1

Z1

C1

A2

B2

Z2

C2

AT

BT

ZT

CT

  • Does not take correlation into account

UAI-2004 © Sergey Kirshner, UC Irvine

8 of 41

Hidden Markov Model

8

R1

R2

Rt

RT-1

RT

S1

S2

St

ST-1

ST

UAI-2004 © Sergey Kirshner, UC Irvine

9 of 41

HMM-Conditional-Independence

9

Rt

St

St

At

Ct

Zt

Bt

=

R1

R2

Rt

RT-1

RT

S1

S2

St

ST-1

ST

UAI-2004 © Sergey Kirshner, UC Irvine

10 of 41

HMM-CI: Is It Sufficient?

  • Simple yet effective

  • Requires large number of values for St
  • Emissions can be made to capture more spatial dependencies

10

UAI-2004 © Sergey Kirshner, UC Irvine

11 of 41

Chow-Liu Trees

  • Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68]

11

UAI-2004 © Sergey Kirshner, UC Irvine

12 of 41

Illustration of CL-Tree Learning

12

0.3126

0.0229

0.0172

0.0230

0.0183

0.2603

AB

AC

AD

BC

BD

CD

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.53, 0.15, 0.19, 0.13)

(0.44, 0.14, 0.23, 0.19)

(0.46, 0.12, 0.26, 0.16)

(0.64, 0.04, 0.08, 0.24)

A

C

B

D

A

C

B

D

AB

AC

AD

BC

BD

CD

0.3126

0.0229

0.0172

0.0230

0.0183

0.2603

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.53, 0.15, 0.19, 0.13)

(0.44, 0.14, 0.23, 0.19)

(0.46, 0.12, 0.26, 0.16)

(0.64, 0.04, 0.08, 0.24)

A

C

B

D

UAI-2004 © Sergey Kirshner, UC Irvine

13 of 41

Chow-Liu Trees

  • Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68]
  • Learning the structure and the probabilities
    • Compute individual and pairwise marginal distributions for all pairs of variables
    • Compute mutual information (MI) for each pair of variables

    • Build maximum spanning tree with for a complete graph with variables as nodes and MIs as weights
  • Properties
    • Efficient:
      • O(#samples×(#variables)2×(#values per variable)2)
    • Optimal

13

UAI-2004 © Sergey Kirshner, UC Irvine

14 of 41

HMM-Chow-Liu

14

R1

R2

Rt

RT-1

RT

S1

S2

St

ST-1

ST

Rt

St

Bt

Dt

Ct

Bt

Dt

Ct

Bt

Dt

Ct

St

St=1

St=2

St=3

=

T1(Rt)

T2(Rt)

T3(Rt)

At

At

At

UAI-2004 © Sergey Kirshner, UC Irvine

15 of 41

Improving on Chow-Liu Trees

  • Tree edges with low MI add little to the approximation.
  • Observations from the previous time point can be more relevant than from the current one.

  • Idea: Build Chow-Liu tree allowing to include variables from the current and the previous time point.

15

UAI-2004 © Sergey Kirshner, UC Irvine

16 of 41

Conditional Chow-Liu Forests

  • Extension of Chow-Liu trees to conditional distributions
    • Approximation of conditional multivariate distribution with a tree-structured distribution
    • Uses MI to build maximum spanning trees (forest)
      • Variables of two consecutive time points as nodes
      • All nodes corresponding to the earlier time point considered connected before the tree construction
    • Same asymptotic complexity as Chow-Liu trees
      • O(#samples×(#variables)2×(#values per variable)2)
    • Optimal

16

UAI-2004 © Sergey Kirshner, UC Irvine

17 of 41

Example of CCL-Forest Learning

17

B’

A’

C’

B

A

C

0.3126

0.0229

0.0230

0.1207

0.1253

0.0623

0.1392

0.1700

0.0559

0.0033

0.0030

0.0625

AB

AC

BC

A’A

A’B

A’C

B’A

B’B

B’C

C’A

C’B

C’C

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.44, 0.14, 0.23, 0.19)

(0.57, 0.11, 0.11, 0.21)

(0.51, 0.17, 0.07, 0.25)

(0.54, 0.14, 0.14, 0.18)

(0.52, 0.07, 0.16, 0.25)

(0.48, 0.10, 0.11, 0.31)

(0.47, 0.11, 0.21, 0.21)

(0.48, 0.20, 0.20, 0.12)

(0.41, 0.26, 0.17, 0.16)

(0.53, 0.14, 0.14, 0.19)

AB

AC

BC

A’A

A’B

A’C

B’A

B’B

B’C

C’A

C’B

C’C

0.3126

0.0229

0.0230

0.1207

0.1253

0.0623

0.1392

0.1700

0.0559

0.0033

0.0030

0.0625

B’

A’

C’

B

A

C

B’

A’

C’

B

A

C

B’

A’

C’

B

A

C

UAI-2004 © Sergey Kirshner, UC Irvine

18 of 41

AR-HMM

18

R1

Rt

RT

S1

St

ST

Rt-1

St-1

R2

S2

UAI-2004 © Sergey Kirshner, UC Irvine

19 of 41

HMM-Conditional-Chow-Liu

19

St

Rt-1

Rt

R1

Rt

RT

S1

St

ST

Rt-1

St-1

R2

S2

At-1

Bt-1

Ct-1

Dt-1

At

Bt

Ct

Dt

Dt-1

Ct-1

Bt-1

At-1

Ct

Dt

At

Bt

Dt-1

Ct-1

Bt-1

At-1

Dt

Ct

At

Bt

St

St=1

St=2

St=3

=

UAI-2004 © Sergey Kirshner, UC Irvine

20 of 41

Inference and Learning for HMM-CL and HMM-CCL

  • Inference (calculating P(S|R,Θ))
    • Recursively calculate P(R1:t,St|Θ) and P(Rt+1:T|St,Θ) (Forward-Backward)

  • Learning (Baum-Welch or EM)
    • E-step: calculate P(S|R,Θ)
      • Forward-Backward
      • Calculate P(St|R,Θ) and P(St,St+1|R,Θ)
    • M-step:
      • Maximize EP(S|R,Θ)[P(S, R|Θ’)]
      • Similar to mixtures of Chow-Liu trees

20

UAI-2004 © Sergey Kirshner, UC Irvine

21 of 41

Chain Chow-Liu Forest (CCLF)

21

R1

Rt

RT

Rt-1

R2

Rt

Rt-1

Bt

Ct

Dt

At

At

Bt

Ct

Dt

=

UAI-2004 © Sergey Kirshner, UC Irvine

22 of 41

Complexity Analysis

22

Model

Criterion

HMM-CI

HMM-CL

HMM-CCL

# params

K2+MK(V-1)

K2+K(M-1)(V2-1)

K2+KM(V2-1)

Time (per iteration)

O(NTK(K+M))

O(NTK(K+M2V2))

O(NTK(K+

+M2V2))

Space

O(NTK(K+M))

O(NTK(K+M)+KM2V2)

O(NTK(K+M)+

+KM2V2)

N – number of sequences

T – length of each sequence

K – number of hidden states

M – dimensionality of each vector

V – number of possible values for each vector component

UAI-2004 © Sergey Kirshner, UC Irvine

23 of 41

Experimental Setup

  • Data
    • Australia
      • 15 seasons, 184 days each, 30 stations
    • Western U.S.
      • 39 seasons, 90 days each, 8 stations

  • Measuring predictive performance
    • Choose K (number of states)
    • Leave-one-out cross-validation
    • Log-likelihood
    • Error for prediction of a single entry given the rest

23

UAI-2004 © Sergey Kirshner, UC Irvine

24 of 41

Australia (log-likelihood)

24

UAI-2004 © Sergey Kirshner, UC Irvine

25 of 41

Australia (predictive error)

25

UAI-2004 © Sergey Kirshner, UC Irvine

26 of 41

Deeper Look at Weather States

26

UAI-2004 © Sergey Kirshner, UC Irvine

27 of 41

Western U.S. (log-likelihood)

27

UAI-2004 © Sergey Kirshner, UC Irvine

28 of 41

Western U.S. (predictive error)

28

UAI-2004 © Sergey Kirshner, UC Irvine

29 of 41

Summary

  • Efficient approximation for finite-valued conditional distributions
    • Conditional Chow-Liu forests
  • New models for spatio-temporal finite-valued data
    • HMM with Chow-Liu trees
    • HMM with conditional Chow-Liu forests
    • Chain Chow-Liu forests
  • Applied to precipitation modeling

29

UAI-2004 © Sergey Kirshner, UC Irvine

30 of 41

Future Work

  • Extension to real-valued data
  • Priors on tree structure and parameters [Jaakkola and Meila 00]
    • Locations of the stations
  • Interannual variability
    • Atmospheric variables as inputs to non-homogeneous HMM [Robertson et al 04]
  • Other approximations for finite-valued multivariate data
    • Maximum Entropy
    • Multivariate probit models (binary)

30

UAI-2004 © Sergey Kirshner, UC Irvine

31 of 41

Acknowledgements

  • DOE (DE-FG02-02ER63413)

  • NSF (SCI-0225642)

  • Dr. Stephen Charles of CSIRO, Australia

  • Datalab @ UCI (http://www.datalab.uci.edu)

31

UAI-2004 © Sergey Kirshner, UC Irvine

32 of 41

A Bit of Notation

32

A1

B1

Z1

C1

R1

A2

B2

Z2

C2

R2

At

Bt

Zt

Ct

Rt

AT

BT

ZT

CT

Rt

UAI-2004 © Sergey Kirshner, UC Irvine

33 of 41

Weather Generator

33

A1

B1

Z1

C1

R1

A2

B2

Z2

C2

R2

AT

BT

ZT

CT

RT

UAI-2004 © Sergey Kirshner, UC Irvine

34 of 41

Illustration of CL-Tree Learning

34

A

C

B

D

AB

AC

AD

BC

BD

CD

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.53, 0.15, 0.19, 0.13)

(0.44, 0.14, 0.23, 0.19)

(0.46, 0.12, 0.26, 0.16)

(0.64, 0.04, 0.08, 0.24)

0.3126

0.0229

0.0172

0.0230

0.0183

0.2603

A

C

B

D

UAI-2004 © Sergey Kirshner, UC Irvine

35 of 41

Mixture of Chow-Liu Trees

  • Chow-Liu trees inside a mixture model [Meila and Jordan 00]
  • Parameters and structures learned by Expectation-Maximization

35

Rt

St

Bt

Dt

Ct

Bt

Dt

Ct

Bt

Dt

Ct

St

St=1

St=2

St=3

=

T1(Rt)

T2(Rt)

T3(Rt)

At

At

At

UAI-2004 © Sergey Kirshner, UC Irvine

36 of 41

HMM-Chow-Liu

36

UAI-2004 © Sergey Kirshner, UC Irvine

37 of 41

HMM-Conditional-Chow-Liu

37

UAI-2004 © Sergey Kirshner, UC Irvine

38 of 41

Conditional Chow-Liu Forests

  • Extension of Chow-Liu trees to conditional distributions
  • Same complexity as Chow-Liu trees

38

UAI-2004 © Sergey Kirshner, UC Irvine

39 of 41

Chain Chow-Liu Forest (CCLF)

39

UAI-2004 © Sergey Kirshner, UC Irvine

40 of 41

Weather Generator

40

R11

R12

R14

R13

R1

R21

R22

R24

R23

R2

RT1

RT2

RT4

RT3

RT

UAI-2004 © Sergey Kirshner, UC Irvine

41 of 41

Complexity Analysis

41

HMM-CI

HMM-CL

HMM-CCL

CCLF

# params

K2+MK(V-1)

K2+K(M-1)*

(V2-1)

K2+KM(V2-1)

M(V2-1)

Time

O(TK2M)

O(TK2M)

O(TK2M)

O(TM)

Space

O(

O(TV+)

UAI-2004 © Sergey Kirshner, UC Irvine