2 of 40

HW 3

Caution: please re-download the data
Due: 4/8/2025 🡪 likely to be delayed

3 of 40

Final project (30%)

Project proposal: 4/2 (3%)
Instructions are on Carmen

4 of 40

Today (32)

Generative models

5 of 40

Recap: Popular CNN “architectures”

Encoder, decoder:

Neural networks have the “capacity” to map images to vectors
Neural networks have the “capacity” to map vectors to images

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

6 of 40

Recognition models vs. generative models

In the following, without loss of generality, we will use a “trapezoid” icon to represent a neural network (regardless of it is a CNN, RNN, Transformer, etc.)

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

7 of 40

Critical difference

A recognition model should perform “many-to-one” mapping:

All bird images classified as “bird”

A generative model should perform “one-to-many” mapping:

A class label “bird” should result in various bird images

Generative models achieve this “ambiguity” by making g a “stochastic function”

8 of 40

Illustration

Big-GAN

Diffusion models

Real

[Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021]

9 of 40

How can we make a neural network stochastic?

Solution: “explicitly” introduce a stochastic input

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

10 of 40

Terminology

z: latent variables, as not directly observed
y vs. z: specific class vs. specific instance (e.g., color, pose, size, etc.)
Typically: Gaussian , thus also called noise

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

11 of 40

Unconditional generative models

Can we learn the generator g without access to class label?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Gray: visible; White: latent

12 of 40

Learning generative model

The training data only has x but not y

In generation, we sample z to generate different x

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

13 of 40

What is the objective?

Answers to this question lead to different branches of generative models

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

14 of 40

What is the objective?

High-level: the output of the generator looks like real data

Different realization:

Match certain statistics: mean color, color variance, etc.
Synthetic data have high probability under a density model fit to real data
Synthetic data and real data are indistinguishable

15 of 40

Direct and indirect approaches

Direct

Indirect

Synthetic data have high probability under a density model fit to real data

16 of 40

Direct and indirect approaches

About how to generate an image

Category 1 is more straightforward in generation (GAN, diffusion models)
Category 2 is more straightforward in learning

17 of 40

What will you see today?

Density model
Autoregressive density model
Diffusion model
GAN

18 of 40

Density model

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

19 of 40

Example: Gaussian

20 of 40

Learning density function

Minimize the Kullback–Leibler (KL) divergence between the model and data

21 of 40

Learning density function

Minimize the Kullback–Leibler (KL) divergence between the model and data

22 of 40

Learning density function

Minimize the Kullback–Leibler (KL) divergence between the model and data

23 of 40

Learning density function

Minimize the Kullback–Leibler (KL) divergence between the model and data

24 of 40

Learning density function

Minimize the Kullback–Leibler (KL) divergence between the model and data

Maximum likelihood estimation (MLE)

= Minimum KL divergence

25 of 40

Illustration

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

26 of 40

Density model for images

27 of 40

Autoregressive density model

Chain rule decomposition:

How many possible outcomes per step?

What should be the probability distribution model?

28 of 40

Illustration

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Predict the next pixel based on local context!

29 of 40

Softmax for categorical distribution modeling

30 of 40

Training

What is the objective?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

The training process is entirely supervised!

31 of 40

Training

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

32 of 40

Dive into autoregressive model training

Break a hard problem (image generation) into simple pieces (predicting pixels)
The breaking process is like adding corruption (masking)

How about corruption with Gaussian noise?

33 of 40

Diffusion models

36 of 40

Neural network model

37 of 40

What is the objective?

High-level: the output of the generator looks like real data

Different realization:

Match certain statistics: mean color, color variance, etc.
Synthetic data have high probability under a density model fit to real data
Synthetic data and real data are indistinguishable

38 of 40

Generative adversarial networks (GAN)

39 of 40

Generative adversarial net (GAN)

Generator

Discriminator

REAL

FAKE

[Credits: Mengdi Fan and Xinyu Zhou, CSE 5539 course presentation]

40 of 40

Example results (by Style-GAN)

[A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019]

1 of 40

2 of 40

3 of 40

4 of 40

5 of 40

6 of 40

7 of 40

8 of 40

9 of 40

10 of 40

11 of 40

12 of 40

13 of 40

14 of 40

15 of 40

16 of 40

17 of 40

18 of 40

19 of 40

20 of 40

21 of 40

22 of 40

23 of 40

24 of 40

25 of 40

26 of 40

27 of 40

28 of 40

29 of 40

30 of 40

31 of 40

32 of 40

33 of 40

34 of 40

35 of 40

36 of 40

37 of 40

38 of 40

39 of 40

40 of 40