1 of 75

CSE 5539: �Generative Models

2 of 75

Generative models

2

3 of 75

Generative models

 

 

image

 

Easily samplable distribution

[Credits: Tutorial on Diffusion Models]

4 of 75

Generative models

  • How to generate data (e.g., images)?
  • By design
  • By learning from “real” data

  • Scenarios:
  • From unlabeled data: p(x)
  • From labeled data: p(y)p(x | y)

Cat

Dog

Bird

Lion

Tiger

Penguin

5 of 75

What and how to learn?

5

 

 

 

 

6 of 75

Objective 1: maximum likelihood estimation (MLE)

similar

Maximum likelihood (ML)

Maximum a posterior (MAP)

Variational, EM, …

Similar models OR data

How likely the model can generate the true data?

7 of 75

Objective 1: maximum likelihood estimation (MLE)

  • How to build:
  • Kullback–Leibler (KL) divergence

Density estimation

MLE =

min KL with empirical distribution

8 of 75

Generative models

  • What to build:
  • Explicit density estimation: and sample from it

Learn with maximum likelihood or its variants

  • Implicit density estimation: learn a model to sample from

Learn with other objectives

evaluation

generation

9 of 75

[Ian Goodfellow, Tutorial on generative adversarial models, 2017]

[Stanford CS231n]

10 of 75

Generative adversarial net (GAN)

10

Generator

Discriminator

REAL

FAKE

[Credits: Mengdi Fan and Xinyu Zhou, CSE 5539 course presentation]

 

 

 

11 of 75

Example results (by Style-GAN)

11

[A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019]

12 of 75

Other generative models

  • Denoising Diffusion Probabilistic Models (PPDM)
    • Learn to inverse the diffusion process
    • Can generate very high-quality images

12

Diffusion by simple Gaussian

Denoising by neural networks (each step by a U-net!)

[Denoising Diffusion Probabilistic Models, NeurIPS 2020]

13 of 75

Other generative models

  • Denoising Diffusion Probabilistic Models (PPDM)

13

[Denoising Diffusion Probabilistic Models, NeurIPS 2020]

14 of 75

Other generative models

14

Big-GAN

Diffusion models

Real

[Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021]

15 of 75

Conditional image generation

  •  

15

[Zhu et al., 2017]

[Wang et al., 2018]

16 of 75

Conditional image generation

  •  

16

[Hierarchical Text-Conditional Image Generation with CLIP Latents, arXiv 2022]

17 of 75

Diffusion models

18 of 75

Reference

  • What are Diffusion Models? https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
  • Denoising Diffusion Probabilistic Models, NeurIPS 2020
  • Denoising Diffusion Implicit Models, ICLR 2020
  • Improved Denoising Diffusion Probabilistic Models, ICML 2021
  • Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021
  • Classifier-Free Diffusion Guidance, NeurIPS-W 2021
  • Cascaded Diffusion Models for High Fidelity Image Generation, arXiv 2021

19 of 75

Reference

  • GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, arXiv 2021
  • Hierarchical Text-Conditional Image Generation with CLIP Latents, arXiv 2022 (DALLE-v2)
  • Junan Chen, Xiangyu Chen, Christian Belardi, Khiem Pham, Tutorial on Diffusion Models, 2022

20 of 75

(Probabilistic) generative models

 

 

image

(optional)

 

 

Easily samplable distribution

[Credits: Tutorial on Diffusion Models]

21 of 75

Existing (probabilistic) generative models

[Credits: What are Diffusion Models?]

Problems:

Training stability and sample diversity

Surrogate loss and encoder model

Specialized architecture for reversible transform

22 of 75

Diffusion models

Diffusion/forward process

Denoising/reverse/generation process

  • Forward: a Markov chain of diffusion steps to slowly add random noise to data
  • Reverse: learn to reverse the diffusion process to construct desired data samples from the noise.
  • Latent variables: same size as the input

 

 

[Credits: What are Diffusion Models?]

23 of 75

Diffusion in (non-equilibrium) thermodynamics

Diffusion Process

Denoising Process

(Reverse of Diffusion)

[Credits: Tutorial on Diffusion Models]

24 of 75

Steps to understand diffusion models

  • Modeling, forward

  • Modeling, reverse

  • Training

  • Extension

25 of 75

Let’s look again

diffuse

denoise

“approximate” denoise

[Credits: Denoising Diffusion Probabilistic Models]

26 of 75

Diffusion/forward process

  • Markovian steps (add noise gradually)

  • Conditional joint probability

Scale down + add noise

Hyper-parameters

27 of 75

Diffusion/forward process

  •  

 

28 of 75

Let’s look again

diffuse

denoise

“approximate” denoise

 

29 of 75

Steps to understand diffusion models

  • Modeling, forward

  • Modeling, reverse

  • Training

  • Extension

30 of 75

Denoising/reverse process

  •  

31 of 75

Denoising/reverse process

  •  

32 of 75

Denoising/reverse process

 

[Credits: Sohl-Dickstein et al., 2015]

33 of 75

Steps to understand diffusion models

  • Modeling, forward

  • Modeling, reverse

  • Training

  • Extension

34 of 75

 

  •  

35 of 75

How to minimize (1)

  •  

36 of 75

How to minimize (2)

  •  

 

 

 

Ignore!

Auto-encoder:

Separately learned

37 of 75

How to minimize (3)

  •  

 

 

38 of 75

Short recap: diffusion models

  • To learn:

  • To generate by:

 

39 of 75

DDPM

 

 

40 of 75

DDPM

diffuse

denoise

“approximate” denoise

 

41 of 75

Generation

42 of 75

Summary

  • Pros:
    • DDPM is tractable
    • DDPM is flexible: can fit arbitrary structures in data (multiple steps)

  • Cons:
    • Relies on long Markov Chain diffusion and denoising: very slow to generate
    • Usually, we need to train T neural networks and apply all of them for generating one image

43 of 75

Why is it the right way to go? (by Harry)

  • The whole diffusion process essentially goes through T times L layers of neural networks (U-Net).
  • Every step only does a little, so we do not need to specifically design a generator architecture that directly goes from noise to images.
  • We have deep supervised signal in the middle, and those supervised signal can be derived easily without additional learning.

44 of 75

Fun facts

  •  

45 of 75

Fun facts

  •  

46 of 75

Steps to understand diffusion models

  • Modeling, forward

  • Modeling, reverse

  • Training

  • Extension

47 of 75

(Class) conditioned generation

  •  

48 of 75

(Class) conditioned generation

  •  

 

 

 

 

49 of 75

(Class) conditioned generation

  •  

[Diffusion Models Beat GANs on Image Synthesis]

50 of 75

Class-conditioned DDPM (sampling)

 

51 of 75

(Class) conditioned generation

Left to right

  • Big GAN
  • Diffusion Model
  • Test set

52 of 75

(Class) conditioned generation

  •  

[GLIDE]

53 of 75

(Class) conditioned generation

  •  

[Classifier-Free Diffusion Guidance]

54 of 75

More traditional methods

55 of 75

[Ian Goodfellow, Tutorial on generative adversarial models, 2017]

[Stanford CS231n]

56 of 75

Type of generative models

Tractable density

Approximate density: MC

Approximate density

Implicit density:

Direct

[credits: Shakir Mohamed DL Summer School 2016]

57 of 75

Fully-observed models

Directed decomposition:

x[1]

x[2]

x[3]

x[d]

Chain rules

58 of 75

Fully-observed models

Directed decomposition:

  • Pixel-RNN [van der Oord et al.]

  • Generate pixel by pixel
  • Model p(.|.) by RNN (LSTM)

[credits: Stanford CS231n]

59 of 75

Fully-observed models

Directed decomposition:

  • Pixel-CNN [van der Oord et al.]

  • Generate pixel by pixel
  • Model p(.|.) by CNN
  • Prediction given the context
  • Can learn in parallel

[credits: Stanford CS231n]

60 of 75

Fully-observed models

Undirect decomposition:

  • Markov random field:

  • clique: a complete sub graph
  • : potential function
  • Example:

x[1]

x[2]

x[3]

x[4]

61 of 75

Fully-observed models

  • Directed decomposition:

  • Easy to learn: tractable density (likelihood is directly computable)
  • Order sensitive, need sequential generation

  • Undirect decomposition:

  • Hard to learn: the normalization constant can be intractable
  • Need (iterative) sequential generation

62 of 75

Examples

……

[START]

[START]

[START]

p(x[1])

p(x[2] | history)

p(x[3] | history)

p(x[d] | history)

x[1]

x[2]

x[d-1]

63 of 75

Latent variable models

z

x

z2

x

z1

  • Easy to sample and generate data
  • Easy to include hierarchy, encode structure, avoid order dependency
  • Hard to learn with maximum likelihood: intractable density

[credits: Shakir Mohamed DL Summer School 2016]

64 of 75

Transformation models

  • Easy to sample and generate data
  • Hard to learn for general models using “maximum likelihood”

[credits: Shakir Mohamed DL Summer School 2016]

Normalizing

Flow!

65 of 75

Two-sample tests

similar

If two populations --- real and generated data --- are similar

66 of 75

Two-sample tests

real

generated

learning objective

67 of 75

Backup

68 of 75

 

  •  

69 of 75

How to minimize (1)

  •  

70 of 75

How to minimize (2)

  •  

71 of 75

How to minimize (2)

  •  

 

72 of 75

How to minimize (2)

  •  

 

 

 

Ignore!

Auto-encoder:

Separately learned

73 of 75

How to minimize (3)

  •  

 

74 of 75

How to minimize (3)

  •  

 

 

 

75 of 75

How to minimize (3)

  •