Lecture 8:
Generative Models I
Sookyung Kim
Spring 2025
Era of Generative model
Spring 2024
2
Era of Generative model
Spring 2024
3
Era of Generative model
Spring 2024
4
Supervised vs. Unsupervised Learning
Supervised Learning
Unsupervised Learning
Spring 2024
5
Taxonomy of Generative Models
Generative models
Explicit density
Implicit density
Tractable density
Approximate density
Variational
Stochastic
Direct
Stochastic
Generative Adversarial Networks (GAN)
Generative Stochastic Networks (GSN)
Variational Autoencoders
Boltzmann Machine
Fully Visible Belief Nets
PixelRNN/CNN
Ian Goodfellow, Tutorial on Generative Adversarial Networks https://arxiv.org/abs/1701.00160
Lecture 8
Lecture 9
PixelRNN/CNN
Generative Adversarial Networks (GAN)
Variational Autoencoders
Stable Diffusion�(DDPN)
Lecture 10
Spring 2024
6
Generative Modeling
pmodel(x)
Step 1: density estimation
Step 2: sampling
Spring 2024
7
Generative Modeling
Why generative models?
Spring 2024
8
PixelRNN & PixelCNN
Spring 2024
9
Pixel-by-pixel Image Generation
1 | 2 | 3 | 4 | 5 |
6 | 7 | 8 | 9 | 10 |
11 | 12 | ... | | |
| | | | |
| | | | |
1 | 2 | 4 | 7 | 11 |
3 | 5 | 8 | 12 | |
6 | 9 | ... | | |
10 | | | | |
| | | | |
1 | | 2 | | 3 |
| 10 | | 11 | |
4 | | 5 | | 6 |
| 12 | | ... | |
7 | | 8 | | 9 |
Spring 2024
10
Pixel-by-pixel Image Generation
Spring 2024
11
Pixel-by-pixel Image Generation
Spring 2024
12
RNN (Review)
Spring 2024
13
RNN (Review)
Spring 2024
14
Pixel-by-pixel Image Generation (An RNN setting)
Input: masked ground truth generated so far
Output: prob. dist. of next pixel RGB
Compute loss with GT and backprop
fW
h0
h1
fW
h2
fW
h3
fW
h4
Spring 2024
15
PixelRNN (1): Row LSTM
Input image xt
Previous hidden state ht-1
W
U
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Spring 2024
16
PixelRNN (1): Row LSTM
Input image xt
Previous hidden state ht-1
W
U
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Spring 2024
17
PixelRNN (1): Row LSTM
Input image xt
Previous hidden state ht-1
W
U
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Note: this process is actually done in parallel, within the same row!
Spring 2024
18
PixelRNN (1): Row LSTM
Previous hidden state ht-1
U
Receptive field is triangular, not covering the entire pixels previously generated!
That is, it does not use all available context.
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Spring 2024
19
PixelRNN (2): Diagonal BiLSTM
Input image xt
Previous hidden state ht-1
W
U
Input-to-state is 1×1 conv
State-to-state is 2×1 conv�(See next)
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Entire diagonal is processed in parallel, each relying on adjacent cells.
Spring 2024
20
PixelRNN (2): Diagonal BiLSTM
Spring 2024
21
PixelCNN
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
Spring 2024
22
Pixel Recursive Super Resolution
Spring 2024
23
Pixel Recursive Super Resolution
softmax
logits
logits
Conditioning Network Ai(x)
Prior Network Bi(y<i)
Sample the target pixel, then continue to the next one (i+1).
Pixel i
PixelCNN
CNN
+
Spring 2024
24
Pixel Recursive Super Resolution
Vector of size K
Scalar
Spring 2024
25
Pixel Recursive Super Resolution: Result
Spring 2024
26
Pixel Recursive Super Resolution: Result
Kim, Sookyung, et al. "Resolution reconstruction of climate data with pixel recursive model." 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2017.
Spring 2024
27
Pixel Recursive Super Resolution: Result
Kim, Sookyung, et al. "Resolution reconstruction of climate data with pixel recursive model." 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2017.
Spring 2024
28
Autoencoders
Spring 2024
29
Autoencoders
x
x̂
z
Encoder
h(x)
Decoder
g(z)
Spring 2024
30
Autoencoders
x
x̂
z
Encoder
h(x)
Decoder
g(z)
Spring 2024
31
Denoising Autoencoders (DAE)
x’
x̂
z
Encoder
h(x’)
Decoder
g(z)
x
Random noise q(x’|x)
Spring 2024
32
Denoising Autoencoders
Input data x lies closely on a low-dimensional manifold.
Corruption q maps x to farther away from this manifold.
The model g(h(x)) learns how to map x’ back to the manifold.
Spring 2024
33
Variational Autoencoders
Spring 2024
34
Variational Autoencoders
x
x̂
z
Encoder
h(x)
Decoder
g(z)
pmodel(x)
Step 1: density estimation
Step 2: sampling
z
x
x̂
Spring 2024
35
Variational Autoencoders
x
z
gθ(z)
Generator
4
9
Spring 2024
36
Variational Autoencoders: First try
Spring 2024
37
Variational Autoencoders: First try (cont’d)
No! We’ve seen it’s not the case:
Spring 2024
38
Variational Autoencoders: Main Idea
x
z
gθ(z)
Generator
Spring 2024
39
Variational Autoencoders: Derivation
Bayes’ Rule
Multiplied by 1
Organize by color
By def. of expectation, KL divergence
Reconstruction (likelihood of the data x). As we model p(x|z) by the generator g, this term is same as in the first try!
We enforce that the encoder qф(z|x) embeds the data x to a prior distribution p(z) we assume, e.g., Gaussian.
We don’t know p(z|x). As KL divergence ≥ 0, the first two terms are a lower bound of log p(x).
Spring 2024
40
Variational Autoencoders: Overall Structure
x
x̂
z
Encoder
qф(z|x)
Generator
gθ(z)
Spring 2024
41
Variational Autoencoders: Overall Structure
x
x̂
z
Encoder
qф(z|x)
Generator
gθ(x|z)
μz|x
Σz|x
KL Divergence between two Gaussians: acts like a regularizer!
Sample z from N(μz|x, Σz|x)
Reconstruction loss
Spring 2024
42
Variational Autoencoders: Overall Structure
x̂
z
Generator
gθ(x|z)
Spring 2024
43
Variational Autoencoders: Examples
More smile
Less smile
Gaze left
Gaze right
0
1
2
3
4
5
6
7
8
9
Spring 2024
44
Variational Autoencoders: Summary
Spring 2024
45