1 of 23

Local Exchangeability

Trevor Campbell Saifuddin Syed Chiao-Yu Yang Michael Jordan Tamara Broderick� University of British Columbia University of California, Berkeley MIT

2 of 23

Exchangeability

The order in which data are observed is often unimportant

aether, black hole, quantum, dark matter, ...

quantum, aether, black hole, dark matter, ...

e.g. document topics

data sequence is exchangeable

iff

de Finetti: data are exchangeable iff they are conditionally i.i.d.

  • justifies Bayesian (nonparametric) statistics
  • simple & efficient inference algorithms

quantum

aether

black hole

dark matter

...

3 of 23

Exchangeability

Rainfall estimation� [Guestrin+ 05]

real data isn’t usually exchangeable due to the availability of covariates

Modelling text over time � [Blei+ 06]

quantum

black�hole

dark matter

luminiferous

aether

1700s

1900s

quantum

black

hole

dark matter

luminiferous �aether

1700s

1900s

Learning control system dynamics �[Deisenroth+ 15]

but intuitively, data with similar covariates are “nearly exchangeable”

4 of 23

This work

A new theory of local exchangeability for covariate-dependent data�

  • swapping nearby covariates yields bounded change in distribution�
  • generalizes classical and partial exchangeability
  • de-Finetti-like representation theorem
  • regularity (smoothness) properties�
  • approximate sufficiency & data binning�
  • lots of examples (see the paper)

GP regression �[Rasmussen+ 06]

Kernel BP feature allocation � [Ren+ 11]

DDP temporal clustering �[MacEachern 99]

Dynamic topic modelling �[Blei+ 06]

5 of 23

History

“we must take up the case where we still encounter ‘analogies’ among the events under consideration, but without their attaining the limiting case of exchangeability.” [de Finetti 38]

Bruno de Finetti

this is an 80-year-old problem

6 of 23

Setup

stochastic process with index set

pseudometric: distance considers only spatial location�observations at the same location are exchangeable

observation index

observation spatial location covariate

heights of uniformly random people

sequence of data

pseudometric: indices aren’t informative

data are exchangeable

Gaussian process regression

pseudometric captures proximity of covariates

separable, no isolated points

7 of 23

Exchangeability [de Finetti]

can swap indices of all observations

Defn: for any finite set of covariates , permutation

the distribution is invariant to swapping any finite collection of observations

Theorem (sketch) [de Finetti]: There is a unique random distribution s.t.

8 of 23

Partial exchangeability [de Finetti 38, Lauritzen 74, Diaconis+ 78, etc]

can only swap within these groups

Defn: for any finite set of covariates , permutation

as long as

the distribution is invariant to swapping observations with identical covariates

Theorem (sketch) [de Finetti]: There is a unique stochastic process of distributions s.t.

9 of 23

Partial exchangeability [de Finetti 38, Lauritzen 74, Diaconis+ 78, etc]

problem 1: this doesn’t address our problem of “nearby” covariates

��

successfully captures related populations in BNP (e.g. HDP mixture), but:

problem 2: theory requires infinite data in each equivalence class of covariates

problem 3: can be arbitrarily complex, not so useful for modelling

10 of 23

Local exchangeability

Key Idea: add a bit of (total variation) wiggle room to exchangeability�

Defn: is -locally exchangeable if for any finite set of � covariates and permutation

11 of 23

Why total variation?

Others have drawbacks:

  • depend on a particular metric (Wasserstein, Prokhorov)
  • are asymmetric (KL, Renyi, chi divergence)
  • are stronger than TV, require domination conditions (symmetrized divergences)

or are equivalent (Hellinger)

12 of 23

Is this a generalization?

Yes: generalizes (partial) exchangeability

Let be -locally exchangeable under pseudometric

  • If identically, then is exchangeable
  • If for indices in the same class, then is partially exchangeable

so

Why:

13 of 23

Does it hold in nontrivial cases?

Yes: if is independently generated from latent measure process

is smooth enough on average

relativity

electron

luminiferous aether

black hole

i.i.d. papers

i.i.d. papers

i.i.d. papers

i.i.d. papers

1700

1800

2000

1900

time

topic�probability

1

e.g. Dynamic Topic Model

0

Proposition: is -locally exchangeable if and

14 of 23

Gaussian process example

e.g.: show stationary GP regression models are locally exchangeable

{

15 of 23

de Finetti representation

Main Theorem (de Finetti representation)� is an -locally exchangeable process iff

2) for any finite set of covariates , permutation

1) it is independently generated from a latent measure process

16 of 23

Regularity

Is the underlying measure for an f-locally exchangeable process always continuous?

No:

relativity

electron

luminiferous aether

black hole

1700

1800

2000

1900

time

topic�probability

1

e.g. Dynamic Topic Model

0

...but this is OK�many BNP processes are intuitively locally exchangeable�but aren’t sample-continuous

17 of 23

Regularity

Is the underlying measure for an f-locally exchangeable process always continuous?

This is f-locally exchangeable with

Euclidean distance and

0

1

No:

18 of 23

Are there cases with more regularity?

Yes: the faster decays as , the smoother the process

Theorem: Suppose as . Then:

is constant�is exchangeable

is weak-continuous & weak-sense stationary�is stationary

(can’t say anything)

19 of 23

Sufficiency

If is exchangeable, then the indices contain no information about

empirical measure

bounded function

is independent of given

(“ is sufficient”)

20 of 23

Approximate sufficiency & binned data

Theorem: Suppose we partition into bins and form the binned observations . Then

we often don’t observe exact indices, but rather “binned” / “coarse” versions

Local Exch: the binned empirical measures are “approximately sufficient” for

Insight: equivalent to forming empirical measures in each bin

see the paper for a more powerful sufficiency result

21 of 23

Conclusion

A new theory of local exchangeability for covariate-dependent data:�

  • generalizes classical and partial exchangeability�
  • de-Finetti-like representation theorem�
  • regularity (smoothness) properties�
  • approximate sufficiency & “data binning”

http://arxiv.org/abs/1906.09507

22 of 23

Proof sketch

Classical de Finetti [see Aldous 85]

Local de Finetti

select a dense countable subset of

do essentially the same thing�

  • careful index swapping scheme
  • equality convergence in distribution �

23 of 23

Conclusion

A new theory of local exchangeability for covariate-dependent data:�

  • generalizes classical and partial exchangeability�
  • de-Finetti-like representation theorem�
  • regularity (smoothness) properties�
  • approximate sufficiency & “data binning”

http://arxiv.org/abs/1906.09507