3 of 41

Snapshot of the Data

…

UAI-2004 © Sergey Kirshner, UC Irvine

4 of 41

Data Aspects

Correlation

Spatial dependence

Temporal structure

First order dependence

Variability of individual series

Interannual variability

UAI-2004 © Sergey Kirshner, UC Irvine

5 of 41

Modeling Precipitation Occurrence

Southwestern Australia, 1978-92

Western US, 1952-90

UAI-2004 © Sergey Kirshner, UC Irvine

6 of 41

A Bit of Notation

Vector time series R

R_1:T=R₁,..,R_T

Vector observation of R at time t

R_t=(A_t,B_t,…,Z_t)

A₁

B₁

Z₁

C₁

R₁

A₂

B₂

Z₂

C₂

R₂

A_T

B_T

Z_T

C_T

R_T

UAI-2004 © Sergey Kirshner, UC Irvine

7 of 41

Weather Generator

R₁

R₂

R_T

A₁

B₁

Z₁

C₁

A₂

B₂

Z₂

C₂

A_T

B_T

Z_T

C_T

Does not take correlation into account

UAI-2004 © Sergey Kirshner, UC Irvine

8 of 41

Hidden Markov Model

R₁

R₂

R_t

R_T-1

R_T

S₁

S₂

S_t

S_T-1

S_T

UAI-2004 © Sergey Kirshner, UC Irvine

9 of 41

HMM-Conditional-Independence

R_t

S_t

A_t

C_t

Z_t

B_t

R₁

R₂

R_t

R_T-1

R_T

S₁

S₂

S_t

S_T-1

S_T

UAI-2004 © Sergey Kirshner, UC Irvine

10 of 41

HMM-CI: Is It Sufficient?

Simple yet effective

Requires large number of values for S_t
Emissions can be made to capture more spatial dependencies

UAI-2004 © Sergey Kirshner, UC Irvine

11 of 41

Chow-Liu Trees

Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68]

UAI-2004 © Sergey Kirshner, UC Irvine

12 of 41

Illustration of CL-Tree Learning

0.3126

0.0229

0.0172

0.0230

0.0183

0.2603

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.53, 0.15, 0.19, 0.13)

(0.44, 0.14, 0.23, 0.19)

(0.46, 0.12, 0.26, 0.16)

(0.64, 0.04, 0.08, 0.24)

0.3126

0.0229

0.0172

0.0230

0.0183

0.2603

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.53, 0.15, 0.19, 0.13)

(0.44, 0.14, 0.23, 0.19)

(0.46, 0.12, 0.26, 0.16)

(0.64, 0.04, 0.08, 0.24)

UAI-2004 © Sergey Kirshner, UC Irvine

13 of 41

Chow-Liu Trees

Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68]
Learning the structure and the probabilities

Compute individual and pairwise marginal distributions for all pairs of variables
Compute mutual information (MI) for each pair of variables

Build maximum spanning tree with for a complete graph with variables as nodes and MIs as weights

Properties

Efficient:

O(#samples×(#variables)²×(#values per variable)²)

Optimal

UAI-2004 © Sergey Kirshner, UC Irvine

14 of 41

HMM-Chow-Liu

R₁

R₂

R_t

R_T-1

R_T

S₁

S₂

S_t

S_T-1

S_T

R_t

S_t

B_t

D_t

C_t

B_t

D_t

C_t

B_t

D_t

C_t

S_t

S_t=1

S_t=2

S_t=3

T₁(R_t)

T₂(R_t)

T₃(R_t)

A_t

UAI-2004 © Sergey Kirshner, UC Irvine

15 of 41

Improving on Chow-Liu Trees

Tree edges with low MI add little to the approximation.
Observations from the previous time point can be more relevant than from the current one.

Idea: Build Chow-Liu tree allowing to include variables from the current and the previous time point.

UAI-2004 © Sergey Kirshner, UC Irvine

16 of 41

Conditional Chow-Liu Forests

Extension of Chow-Liu trees to conditional distributions

Approximation of conditional multivariate distribution with a tree-structured distribution
Uses MI to build maximum spanning trees (forest)

Variables of two consecutive time points as nodes
All nodes corresponding to the earlier time point considered connected before the tree construction

Same asymptotic complexity as Chow-Liu trees

O(#samples×(#variables)²×(#values per variable)²)

Optimal

UAI-2004 © Sergey Kirshner, UC Irvine

17 of 41

Example of CCL-Forest Learning

B’

A’

C’

0.3126

0.0229

0.0230

0.1207

0.1253

0.0623

0.1392

0.1700

0.0559

0.0033

0.0030

0.0625

A’A

A’B

A’C

B’A

B’B

B’C

C’A

C’B

C’C

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.44, 0.14, 0.23, 0.19)

(0.57, 0.11, 0.11, 0.21)

(0.51, 0.17, 0.07, 0.25)

(0.54, 0.14, 0.14, 0.18)

(0.52, 0.07, 0.16, 0.25)

(0.48, 0.10, 0.11, 0.31)

(0.47, 0.11, 0.21, 0.21)

(0.48, 0.20, 0.20, 0.12)

(0.41, 0.26, 0.17, 0.16)

(0.53, 0.14, 0.14, 0.19)

A’A

A’B

A’C

B’A

B’B

B’C

C’A

C’B

C’C

0.3126

0.0229

0.0230

0.1207

0.1253

0.0623

0.1392

0.1700

0.0559

0.0033

0.0030

0.0625

B’

A’

C’

B’

A’

C’

B’

A’

C’

UAI-2004 © Sergey Kirshner, UC Irvine

18 of 41

AR-HMM

R₁

R_t

R_T

S₁

S_t

S_T

R_t-1

S_t-1

R₂

S₂

UAI-2004 © Sergey Kirshner, UC Irvine

19 of 41

HMM-Conditional-Chow-Liu

S_t

R_t-1

R_t

R₁

R_t

R_T

S₁

S_t

S_T

R_t-1

S_t-1

R₂

S₂

A_t-1

B_t-1

C_t-1

D_t-1

A_t

B_t

C_t

D_t

D_t-1

C_t-1

B_t-1

A_t-1

C_t

D_t

A_t

B_t

D_t-1

C_t-1

B_t-1

A_t-1

D_t

C_t

A_t

B_t

S_t

S_t=1

S_t=2

S_t=3

UAI-2004 © Sergey Kirshner, UC Irvine

20 of 41

Inference and Learning for HMM-CL and HMM-CCL

Inference (calculating P(S|R,Θ))

Recursively calculate P(R_1:t,S_t|Θ) and P(R_t+1:T|S_t,Θ) (Forward-Backward)

Learning (Baum-Welch or EM)

E-step: calculate P(S|R,Θ)

Forward-Backward
Calculate P(S_t|R,Θ) and P(S_t,S_t+1|R,Θ)

M-step:

Maximize E_P₍_S_|_R_,_Θ₎[P(S, R|Θ’)]
Similar to mixtures of Chow-Liu trees

UAI-2004 © Sergey Kirshner, UC Irvine

21 of 41

Chain Chow-Liu Forest (CCLF)

R₁

R_t

R_T

R_t-1

R₂

R_t

R_t-1

B_t

C_t

D_t

A_t

B_t

C_t

D_t

UAI-2004 © Sergey Kirshner, UC Irvine

22 of 41

Complexity Analysis

Model Criterion	HMM-CI	HMM-CL	HMM-CCL
# params	K²+MK(V-1)	K²+K(M-1)(V²-1)	K²+KM(V²-1)
Time (per iteration)	O(NTK(K+M))	O(NTK(K+M²V²))	O(NTK(K+ +M²V²))
Space	O(NTK(K+M))	O(NTK(K+M)+KM²V²)	O(NTK(K+M)+ +KM²V²)

N – number of sequences

T – length of each sequence

K – number of hidden states

M – dimensionality of each vector

V – number of possible values for each vector component

UAI-2004 © Sergey Kirshner, UC Irvine

23 of 41

Experimental Setup

Data

Australia

15 seasons, 184 days each, 30 stations

Western U.S.

39 seasons, 90 days each, 8 stations

Measuring predictive performance

Choose K (number of states)
Leave-one-out cross-validation
Log-likelihood
Error for prediction of a single entry given the rest

24 of 41

Australia (log-likelihood)

25 of 41

Australia (predictive error)

26 of 41

Deeper Look at Weather States

27 of 41

Western U.S. (log-likelihood)

28 of 41

Western U.S. (predictive error)

29 of 41

Summary

Efficient approximation for finite-valued conditional distributions

Conditional Chow-Liu forests

New models for spatio-temporal finite-valued data

HMM with Chow-Liu trees
HMM with conditional Chow-Liu forests
Chain Chow-Liu forests

Applied to precipitation modeling

30 of 41

Future Work

Extension to real-valued data
Priors on tree structure and parameters [Jaakkola and Meila 00]

Locations of the stations

Interannual variability

Atmospheric variables as inputs to non-homogeneous HMM [Robertson et al 04]

Other approximations for finite-valued multivariate data

Maximum Entropy
Multivariate probit models (binary)

31 of 41

Acknowledgements

DOE (DE-FG02-02ER63413)

NSF (SCI-0225642)

Dr. Stephen Charles of CSIRO, Australia

Datalab @ UCI (http://www.datalab.uci.edu)

32 of 41

A Bit of Notation

A₁

B₁

Z₁

C₁

R₁

A₂

B₂

Z₂

C₂

R₂

A_t

B_t

Z_t

C_t

R_t

A_T

B_T

Z_T

C_T

R_t

33 of 41

Weather Generator

A₁

B₁

Z₁

C₁

R₁

A₂

B₂

Z₂

C₂

R₂

A_T

B_T

Z_T

C_T

R_T

34 of 41

Illustration of CL-Tree Learning

(0.56, 0.11, 0.02, 0.31)

(0.51, 0.17, 0.17, 0.15)

(0.53, 0.15, 0.19, 0.13)

(0.44, 0.14, 0.23, 0.19)

(0.46, 0.12, 0.26, 0.16)

(0.64, 0.04, 0.08, 0.24)

0.3126

0.0229

0.0172

0.0230

0.0183

0.2603

35 of 41

Mixture of Chow-Liu Trees

Chow-Liu trees inside a mixture model [Meila and Jordan 00]
Parameters and structures learned by Expectation-Maximization

R_t

S_t

B_t

D_t

C_t

B_t

D_t

C_t

B_t

D_t

C_t

S_t

S_t=1

S_t=2

S_t=3

T₁(R_t)

T₂(R_t)

T₃(R_t)

A_t

36 of 41

HMM-Chow-Liu

37 of 41

HMM-Conditional-Chow-Liu

38 of 41

Conditional Chow-Liu Forests

Extension of Chow-Liu trees to conditional distributions
Same complexity as Chow-Liu trees

39 of 41

Chain Chow-Liu Forest (CCLF)

40 of 41

Weather Generator

R₁¹

R₁²

R₁⁴

R₁³

R₁

R₂¹

R₂²

R₂⁴

R₂³

R₂

R_T¹

R_T²

R_T⁴

R_T³

R_T

41 of 41

Complexity Analysis

	HMM-CI	HMM-CL	HMM-CCL	CCLF
# params	K²+MK(V-1)	K²+K(M-1)* (V²-1)	K²+KM(V²-1)	M(V²-1)
Time	O(TK²M)	O(TK²M)	O(TK²M)	O(TM)
Space	O(			O(TV+)