Generative Adversarial Networks
Amnon Geifman & Adam Yaari
Table of content
First lecture:
Second lecture:
Table of content
First lecture:
Second lecture:
Supervised Vs. Unsupervised Learning
| Supervised | Unsupervised |
Input Data | Data (x), Labels (y) | |
Goal | Learn y = f(x) | |
Examples | Classification, regression, object detection, semantic segmentation, etc. | |
Supervised Vs. Unsupervised Learning
Supervised Vs. Unsupervised Learning
Supervised Vs. Unsupervised Learning
| Supervised | Unsupervised |
Input Data | Data (x), Labels (y) | Data (x) |
Goal | Learn y = f(x) | Learn underlying hidden structure of the data |
Examples | Classification, regression, object detection, semantic segmentation, etc. | Clustering, feature learning, density estimation, dimensionality reduction, etc. |
Supervised Vs. Unsupervised Learning
Clustering:
Supervised Vs. Unsupervised Learning
Feature mapping:
Supervised Vs. Unsupervised Learning
Dimensionality reduction:
What does that got to do with generative models?
Meaning: given training data, generate new samples from the same distribution.
* Modified versions of GAN’s also serve for semi-supervised and reinforcement learning.
What do Generative Models do?
Generative models attempt to estimate the density function of the training data.
Taxonomy of Generative Models
Figure copyright from Ian Goodfellow, Tutorial on GAN, NIPS 2016
Van Der Oord et al. 2016
Pixel RNN (Van Der Oord et al. 2016)
Explicit tractable density function:
Van Der Oord et al. "Pixel Recurrent Neural Networks", 2016.
Pixel RNN (Van Der Oord et al. 2016)
Explicit tractable density function:
Use chain rule to decompose likelihood of an image x into product of 1-d distributions, while maximizing likelihood of training data.
But, what are d?
Pixel RNN (Van Der Oord et al. 2016)
Starting from the top left. Modeled via RNN (LSTM).
Pros:
Cons:
Pixel CNN (Van Der Oord et al. 2016)
An improvement over PixelRNN. Each pixel has an explicit distribution over values 0-255.
Modeled via CNN instead of RNN.
Pros:
Cons:
Pixel CNN (Van Der Oord et al. 2016)
Some results:
Taxonomy of Generative Models
Figure copyright from Ian Goodfellow, Tutorial on GAN, NIPS 2016
Van Der Oord et al. 2016
Kingma & Welling, 2013
Variational Autoencoder (Kingma & Welling, ICLR 2013)
Explicit intractable function:
Kingma and Welling. "Auto-Encoding Variational Bayes", 2013.
Variational Autoencoder (Kingma & Welling, ICLR 2014)
Explicit intractable function:
z
x
Variational Autoencoder (Kingma & Welling, ICLR 2014)
Variational Autoencoder (Kingma & Welling, ICLR 2014)
Some results:
Variational Autoencoder (Kingma & Welling, ICLR 2014)
Pros:
Cons:
Taxonomy of Generative Models
Figure copyright from Ian Goodfellow, Tutorial on GAN, NIPS 2016
Generative Stochastic Networks
Taxonomy of Generative Models
Figure copyright from Ian Goodfellow, Tutorial on GAN, NIPS 2016
Van Der Oord et al. 2016
Kingma & Welling, 2013
Goodfellow et al. 2014
Table of content
First lecture:
Second lecture:
Finally… what are GANs?
Let’s give up on explicitly modeling the density, and just gain the ability to sample.
Generative Adversarial Networks:
learn to approximate the data’s distribution, instead of directly expose it.
Goodfellow et al. "Generative Adversarial Networks", 2014.
Some motivation...
https://hothardware.com/news/nvidia-neural-network-generates-photorealistic-faces-disturbing-results
Some motivation...
Bedrooms generation
Some motivation...
Vector Arithmetic
Some motivation...
Super resolution
Some motivation...
3D reconstruction
Some motivation...
Face compilation
Some motivation...
Assisted painting
Some motivation...
Style transfer/coloring
GAN’s Architecture
An adversarial differentiable function of two players:
Trained to distinguish between samples from pdata and pmodel.
Tries to fool the discriminator by randomly generating samples, i.e. make vvbbpmodel an approximation of pdata.
Goodfellow et al. "Generative Adversarial Networks", 2014.
GAN’s Architecture
x
z
Goodfellow et al. "Generative Adversarial Networks", 2014.
Table of content
First lecture:
Second lecture:
Deep Convolutional GAN (Redford at el. 2016)
Goodfellow’s original paper used a multi-layered perceptron for both Generator and Discriminator.
Redford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", 2016.
Deep Convolutional GAN (Redford at el. 2016)
Original GAN
DCGAN
Redford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", 2016.
Table of content
First lecture:
Second lecture:
GAN’s Objective
Given a training set of real examples x, random noise sampled from a gaussian distribution z, a discriminator D, and a generator G:
Play a zero-sum (minmax) game between the generator and the discriminator.
Discriminator tries to maximize while the Generator tries to minimize
Discriminator score over a real image
Discriminator score over a fake image
Goodfellow et al. "Generative Adversarial Networks", 2014.
D(real image) -> 1, D(fake image) -> 0
GAN’s Algorithm
m*k training iterations for the discriminator
m training iterations for the generator
Goodfellow et al. "Generative Adversarial Networks", 2014.
GAN’s Learning
Goodfellow at el. (2014) showed that minimizing the minmax function over D and G, resembles to minimizing the Jensen-Shannon divergence between the data and model distributions*.
* Given infinite capacity for D and G.
Always read the fine print!
Goodfellow et al. "Generative Adversarial Networks", 2014.
GAN’s Objective In Practice
Goodfellow et al. "Generative Adversarial Networks", 2014.
Nash-Equilibrium in GANs
LeCunn at el. (2017) have shown that given a slightly different objective function:
m - a positive margin
D - Non-negative energy
G - generated image
[]+ - Relu function
There exists a Nash-Equilibrium in which G produces samples that are indistinguishable from the real data*.
* Given infinite capacity for D and G.
Zhao, LeCunn et al. "Energy-based Generative Adversarial Network“, 2016.
Energy Based GAN
Nash-Equilibrium in GANs
Our criticism: A Nash equilibrium may exist in theory but did not seem to be found by the EBGAN (Energy-Based GAN).
Zhao, LeCunn et al. "Energy-based Generative Adversarial Network“, 2017.
Semi-Supervised Learning
GANs are now used with an extremely wide variety of objectives, tasks, architectures, etc.
So, It all seems great!
Table of content
First lecture:
Second lecture:
Do GAN’s really learn the distribution?
What we know so far:
The Birthday Paradox
Suppose there are k people in a room. How large must k be before we have a high likelihood of having two people with the same birthday?
Let p(k) be the probability that no two people have the same birthday.
p(23) = 365/365 + … + 343/365 = 0.493
p(70) = 365/365 + … + 295/365 < 0.001
Overall, a discrete distribution of support N is likely to have duplicates in a sample size about sqrt(N).
Arora, Zhang "Do GANs actually learn the distribution? An empirical study”, 2017.
The Birthday Paradox
Thus, if we find duplicate images after sampling s samples from the generator, then the distribution is likely to have a support of size s2.
Overall, a discrete distribution of support N is likely to have duplicates in a sample size about sqrt(N).
Arora, Zhang "Do GANs actually learn the distribution? An empirical study”, 2017.
The Birthday Paradox
Thus, if samples of size s have duplicate images with good probability, then the distribution is likely have support of size about s2.
Sample (and support) size needed to find duplicates (w.p. > 50%):
Dataset \ Architecture | DCGAN | MIX+DCGAN | ALI (BiGAN) | Stacked GAN |
CelebA | 400 (16,000) | 400 (16,000) | 1000 (1,000,000) | - |
CIFAR-10 | - | - | - | < 500 (< 250,000) |
Arora, Zhang "Do GANs actually learn the distribution? An empirical study”, 2017.
Diversity test results
Arora, Zhang "Do GANs actually learn the distribution? An empirical study”, 2017.
Diversity test results
Diversity as a function of the discriminator’s number of parameters:
Arora, Zhang "Do GANs actually learn the distribution? An empirical study”, 2017.
First Part: “GANs are awesome!”
Second Part: GANs far from being great!
Table of content
First lecture:
Second lecture:
GAN are notoriously hard to train
Vanishing/ exploding gradients
Discriminator domination
Sensitivity to learning rate
Mode collapse
Batch normalization
There are two current research directions
Understanding and improving the training dynamics
Understanding the gap between theory and practice
Improved Techniques for Training GANs, Silmans el al. 2016
Wasserstein GAN Arjovsky et al 2017
Table of content
First lecture:
Second lecture:
Improved Techniques for Training GANs, Silmans el al. 2016
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Feature matching
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Problem #2- mode collapse
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Minibatch discrimination
We will allow the discriminator to look on a batch of examples and not only one example
Problem #2- mode collapse
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Problem #2- mode collapse
Problem #3- local minimum of equilibrium
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Historical avareging
Problem #2- mode collapse
Problem #3- local minimum of equilibrium
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Problem #2- mode collapse
Problem #3- local minimum of equilibrium
Problem #4- batch normalization
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Problem #1- overtraining of discriminator
Virtual batch normalization
Problem #2- mode collapse
Problem #3- local minimum of equilibrium
Problem #4- batch normalization
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Results of the improved training
Feature matching
Minibatch discrimination
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Basic GAN
DCGAN
DCGAN+feature mapping
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
DCGAN
DCGAN+Minibatch discrimination
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
DCGAN
DCGAN + Minibatch discrimination + Feature matching
DCGAN
DCGAN + Minibatch discrimination + Feature matching
There still place for improvement
DCGAN
Improved DCGAN
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
There are two current research directions
Understanding and improving the training dynamics
Understanding the gap between theory and practice
Improved Techniques for Training GANs, Silmans el al. 2016
Wasserstein GAN, Arjovsky et al 2017
Table of content
First lecture:
Second lecture:
Understanding the gap between theory and practice
Some question we want to ask:
Why GAN never converges to the nash equilibrium?
Why is there mode collapse?
What theoretical guarantees we have on the objective?
Divergence/ distance- a tool to analyze GANs
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
List of divergences
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
GAN as divergence minimizer
Claim- (Goodfellow et al. 2014) In regular GAN given the discriminator the generator minimizes the Jensen-Shannon (JS) divergence
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
Example of the weakness of JS
We need better distance measure
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
A better choice- EM distance
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
A better choice- EM distance
If there are several ways to move the mass then choose the shortest one
P
Q
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
A better choice- EM distance
In the discrete case the moving plan is a matrix the value of each element is the amount of earth from on position to another
B(𝛾)=∑xd,xg𝛾(xd,xg)||xd-xg||
The earth moving distance (wasserstein):
min𝛾∊ℿB(𝛾)
A better choice- EM distance
The earth mover distance (wasserstein distance) define to be
Inf over all possible joint distribution with marginals pg , pr
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
EM distance is better than JS
Thm- Let 𝕡g denote the distribution of the r.v g𝜃(z) then under some regularity conditions on g𝜃 the wasserstein distance is continuous and differentiable almost everywhere
The above distance is highly intractable
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
Dual function principle
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
The Kantorovich-Rubinstein duality
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
WGAN objective
We take f to be parametric function
Constrain on the weights of f enforce lipchitz
g is a parametric function that s.t g𝜃(z)~P𝜃
Z is normal r.v
f is the analogous to discriminator but in this model called critic
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
We enforce the constraint by clipping the value of each weight
WGAN algorithm
WGAN results- JS with regular GAN
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
WGAN results
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
WGAN results
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International Conference on Machine Learning. 2017.
Table of content
First lecture:
Second lecture:
Improved Training of Wasserstein GANs
Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." Advances in Neural Information Processing Systems. 2017.
Regularization
WGAN loss
New regularization loss
Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." Advances in Neural Information Processing Systems. 2017.
Improved Training of Wasserstein GANs results
WGAN
WGAN-GP
Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." Advances in Neural Information Processing Systems. 2017.
Improving the improved training of Wasserstein GANs
Improving the improved training of Wasserstein GANs
WGAN-GP
WGAN-CT
Improving the improved training of Wasserstein GANs
WGAN-GP
Improving the improved training of Wasserstein GANs
GANs today
Goodfellow et al. 2014
Super resolution
3-D depth from single view
Image generation
Image inpainting
Style transfer
Text to image
Image to text
Take home massage