Local Exchangeability

Trevor Campbell Saifuddin Syed Chiao-Yu Yang Michael Jordan Tamara Broderick

University of British Columbia University of California, Berkeley MIT

Local Exchangeability

Trevor Campbell Saifuddin Syed Chiao-Yu Yang Michael Jordan Tamara Broderick

University of British Columbia University of California, Berkeley MIT

Exchangeability

The order in which data are observed is often unimportant

aether, black hole, quantum, dark matter, ...

de Finetti: data are exchangeable iff they are conditionally i.i.d.

- justifies Bayesian (nonparametric) statistics
- simple & efficient inference algorithms

quantum

aether

black hole

dark matter

...

quantum, aether, black hole, dark matter, ...

e.g. document topics

data sequence is exchangeable

iff

Exchangeability

Rainfall estimation

[Guestrin+ 05]

real data isn’t usually exchangeable due to the availability of covariates

Modelling text over time

[Blei+ 06]

quantum

black

hole

dark matter

luminiferous

aether

1700s

1900s

quantum

black

hole

dark matter

luminiferous

aether

1700s

1900s

Learning control system dynamics

[Deisenroth+ 15]

but intuitively, data with similar covariates are “nearly exchangeable”

This work

A new theory of local exchangeability for covariate-dependent data

- swapping nearby covariates yields bounded change in distribution
- generalizes classical and partial exchangeability
- de-Finetti-like representation theorem
- regularity (smoothness) properties
- approximate sufficiency & data binning
- lots of examples (see the paper)

GP regression

[Rasmussen+ 06]

Kernel BP feature allocation

[Ren+ 11]

DDP temporal clustering

[MacEachern 99]

Dynamic topic modelling

[Blei+ 06]

Background

“we must take up the case where we still encounter ‘analogies’ among the events under consideration, but without their attaining the limiting case of exchangeability.” [de Finetti 38]

Bruno de Finetti

this is an 80-year-old problem

Setup

stochastic process with index set

distance only considers spatial location, not index

observations at the same location are exchangeable

observation index

observation spatial location covariate

e.g.

exchangeable sequence

sequence of data

all covariates identical

data is exchangeable

e.g.

Gaussian process regression

pseudometric captures proximity of covariates

Partial exchangeability [de Finetti 38, Lauritzen 74, Diaconis+ 78, etc]

can only swap within these groups

Defn: for any finite set of covariates , permutation

as long as

the distribution is invariant to swapping observations with identical covariates

Theorem (sketch) [de Finetti]: There is a unique stochastic process of distributions s.t.

mean of

Partial exchangeability [de Finetti 38, Lauritzen 74, Diaconis+ 78, etc]

problem 1: this doesn’t address our problem of “nearby” covariates

successfully captures related populations in BNP (e.g. HDP mixture), but:

problem 2: theory requires infinite data in each equivalence class of covariates

problem 3: can be arbitrarily complex, not so useful for modelling

Local exchangeability

Key Idea: add a bit of wiggle room to exchangeability

Defn: is -locally exchangeable if for any finite set of

covariates and permutation

Why total variation?

Others have drawbacks:

- depend on a particular metric (Wasserstein, Prokhorov)
- are asymmetric (KL, Renyi, chi divergence)
- are stronger than TV, require domination conditions (symmetrized divergences)

or are equivalent (Hellinger)

Is this sane? Does it ever hold?

Proposition: is -locally exchangeable if and either:

more general

(1 implies 2)

1) (strong)

Yes, if is independently generated from latent measure process

is smooth enough (“Lipschitz on average”)

2) (weak)

Main Theorem (de Finetti representation)

Any -locally exchangeable process satisfies the above (weak) condition

Regularity

Is the underlying measure for an f-locally exchangeable process always continuous?

This is f-locally exchangeable with

Euclidean distance and

0

1

No:

this is OK

many BNP processes with are intuitively locally exchangeable & aren’t sample-continuous

but we do provide sufficient conditions for continuity, stationarity, etc.

Thm: the faster decays as , the smoother the process

Approximate sufficiency & binned data

Theorem: Suppose we partition into bins and know only which bin each observation falls in. Then

we often don’t observe exact covariates, but rather “binned” / “coarse” versions

Local Exch: the binned empirical measures are “approximately sufficient” for

(difference in conditioning on binned measures vs )

Conclusion

A new theory of local exchangeability for covariate-dependent data:

- generalizes classical and partial exchangeability
- de-Finetti-like representation theorem
- regularity (smoothness) properties
- approximate sufficiency & “data binning”

http://arxiv.org/abs/1906.09507

Regularity

but there are cases where there is some additional smoothness

the faster decays as , the smoother the process

Theorem: Suppose as . Then:

is constant

is exchangeable

is weakly-continuous & weak-sense stationary

is stationary

(can’t say anything)