1 of 40

CSE 5524: �Generative models - 1

1

2 of 40

HW 3

  • Caution: please re-download the data
  • Due: 4/8/2025 🡪 likely to be delayed

3 of 40

Final project (30%)

  • Project proposal: 4/2 (3%)
  • Instructions are on Carmen

4 of 40

Today (32)

  • Generative models

4

5 of 40

Recap: Popular CNN “architectures”

  • Encoder, decoder:
    • Neural networks have the “capacity” to map images to vectors
    • Neural networks have the “capacity” to map vectors to images

5

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

6 of 40

Recognition models vs. generative models

  • In the following, without loss of generality, we will use a “trapezoid” icon to represent a neural network (regardless of it is a CNN, RNN, Transformer, etc.)

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

7 of 40

Critical difference

  • A recognition model should perform “many-to-one” mapping:
    • All bird images classified as “bird”

  • A generative model should perform “one-to-many” mapping:
    • A class label “bird” should result in various bird images

Generative models achieve this “ambiguity” by making g a “stochastic function”

8 of 40

Illustration

8

Big-GAN

Diffusion models

Real

[Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021]

9 of 40

How can we make a neural network stochastic?

Solution: “explicitly” introduce a stochastic input

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

10 of 40

Terminology

  • z: latent variables, as not directly observed
  • y vs. z: specific class vs. specific instance (e.g., color, pose, size, etc.)
  • Typically: Gaussian , thus also called noise

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

11 of 40

Unconditional generative models

  • Can we learn the generator g without access to class label?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Gray: visible; White: latent

12 of 40

Learning generative model

  • The training data only has x but not y

  • In generation, we sample z to generate different x

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

13 of 40

What is the objective?

  • Answers to this question lead to different branches of generative models

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

14 of 40

What is the objective?

  • High-level: the output of the generator looks like real data

  • Different realization:
    • Match certain statistics: mean color, color variance, etc.
    • Synthetic data have high probability under a density model fit to real data
    • Synthetic data and real data are indistinguishable

15 of 40

Direct and indirect approaches

Direct

Indirect

Synthetic data have high probability under a density model fit to real data

16 of 40

Direct and indirect approaches

  • About how to generate an image

  • Category 1 is more straightforward in generation (GAN, diffusion models)
  • Category 2 is more straightforward in learning

17 of 40

What will you see today?

  • Density model
  • Autoregressive density model
  • Diffusion model
  • GAN

18 of 40

Density model

  •  

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

19 of 40

Example: Gaussian

20 of 40

Learning density function

  • Minimize the Kullback–Leibler (KL) divergence between the model and data

21 of 40

Learning density function

  • Minimize the Kullback–Leibler (KL) divergence between the model and data

22 of 40

Learning density function

  • Minimize the Kullback–Leibler (KL) divergence between the model and data

23 of 40

Learning density function

  • Minimize the Kullback–Leibler (KL) divergence between the model and data

24 of 40

Learning density function

  • Minimize the Kullback–Leibler (KL) divergence between the model and data

Maximum likelihood estimation (MLE)

= Minimum KL divergence

25 of 40

Illustration

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

26 of 40

Density model for images

  •  

27 of 40

Autoregressive density model

  • Chain rule decomposition:

  • How many possible outcomes per step?

  • What should be the probability distribution model?

28 of 40

Illustration

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Predict the next pixel based on local context!

29 of 40

Softmax for categorical distribution modeling

  •  

30 of 40

Training

  • What is the objective?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

The training process is entirely supervised!

31 of 40

Training

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

32 of 40

Dive into autoregressive model training

  • Break a hard problem (image generation) into simple pieces (predicting pixels)
  • The breaking process is like adding corruption (masking)

  • How about corruption with Gaussian noise?

33 of 40

Diffusion models

34 of 40

Training

35 of 40

Training

36 of 40

Neural network model

37 of 40

What is the objective?

  • High-level: the output of the generator looks like real data

  • Different realization:
    • Match certain statistics: mean color, color variance, etc.
    • Synthetic data have high probability under a density model fit to real data
    • Synthetic data and real data are indistinguishable

38 of 40

Generative adversarial networks (GAN)

39 of 40

Generative adversarial net (GAN)

39

Generator

Discriminator

REAL

FAKE

[Credits: Mengdi Fan and Xinyu Zhou, CSE 5539 course presentation]

 

 

 

40 of 40

Example results (by Style-GAN)

40

[A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019]