On Variational Bounds of Mutual Information
Ben Poole, Sherjil Ozair, Aäron van den Oord, Alexander A. Alemi, George Tucker
ICML 2019
1
Alex Wang and Xiaohui Zeng, sta4273 course presentation.
Mutual Information
Define:
2
Contribution
3
Contribution
4
Lower Bound of I(X;Y)
5
Challenge:
Require a tractable q(x|y) when x has high dimension
Barber & Agakov (2003)
(1)
Lower Bound of I(X;Y)
6
Introduce critic function f(x,y), tractable
Partition function Z(y), intractable
(1)
(2)
(3)
Lower Bound of I(X;Y)
7
(1)
(3)
(4)
(5)
Closer look at TUBA
Two components learned by neural network
8
*In addition, NWJ use self-normalize f(x,y)
(Nguyen et al., 2010) (Belghazi et al., 2018)
(5)
Recap - Lower Bound of I(X;Y)
9
*In addition, NWJ use self-normalize f(x,y)
(Nguyen et al., 2010) (Belghazi et al., 2018) ( van den Oord et al. 2018)
(1)
(3)
(5)
Proposed bound
Existing lower bounds are either high bias or high variance
:Interpolate between NWJ estimator and NCE estimator
10
Experiments
11
Notebook
12
Experiments with Gaussian data
13
Experiments with Gaussian data
14
dSprites Representation Learning
�
15
Key Ideas & Discussion
16
Related Work & follow up
https://arxiv.org/pdf/1807.03748.pdf
https://arxiv.org/pdf/1907.13625.pdf
Learn the gradient directly: https://openreview.net/pdf?id=ByxaUgrFvH
Original VMI paper: https://arxiv.org/abs/1905.06922
17