3 of 75

Generative models

image

Easily samplable distribution

[Credits: Tutorial on Diffusion Models]

4 of 75

Generative models

How to generate data (e.g., images)?
By design
By learning from “real” data

Scenarios:
From unlabeled data: p(x)
From labeled data: p(y)p(x | y)

Cat

Dog

Bird

Lion

Tiger

Penguin

5 of 75

What and how to learn?

6 of 75

Objective 1: maximum likelihood estimation (MLE)

similar

Maximum likelihood (ML)

Maximum a posterior (MAP)

Variational, EM, …

Similar models OR data

How likely the model can generate the true data?

7 of 75

Objective 1: maximum likelihood estimation (MLE)

How to build:
Kullback–Leibler (KL) divergence

Density estimation

MLE =

min KL with empirical distribution

8 of 75

Generative models

What to build:
Explicit density estimation: and sample from it

Learn with maximum likelihood or its variants

Implicit density estimation: learn a model to sample from

Learn with other objectives

evaluation

generation

9 of 75

[Ian Goodfellow, Tutorial on generative adversarial models, 2017]

[Stanford CS231n]

10 of 75

Generative adversarial net (GAN)

Generator

Discriminator

REAL

FAKE

[Credits: Mengdi Fan and Xinyu Zhou, CSE 5539 course presentation]

11 of 75

Example results (by Style-GAN)

[A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019]

12 of 75

Other generative models

Denoising Diffusion Probabilistic Models (PPDM)

Learn to inverse the diffusion process
Can generate very high-quality images

Diffusion by simple Gaussian

Denoising by neural networks (each step by a U-net!)

[Denoising Diffusion Probabilistic Models, NeurIPS 2020]

13 of 75

Other generative models

Denoising Diffusion Probabilistic Models (PPDM)

[Denoising Diffusion Probabilistic Models, NeurIPS 2020]

14 of 75

Other generative models

Big-GAN

Diffusion models

Real

[Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021]

15 of 75

Conditional image generation

[Zhu et al., 2017]

[Wang et al., 2018]

16 of 75

Conditional image generation

[Hierarchical Text-Conditional Image Generation with CLIP Latents, arXiv 2022]

17 of 75

Diffusion models

18 of 75

Reference

What are Diffusion Models? https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Denoising Diffusion Probabilistic Models, NeurIPS 2020
Denoising Diffusion Implicit Models, ICLR 2020
Improved Denoising Diffusion Probabilistic Models, ICML 2021
Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021
Classifier-Free Diffusion Guidance, NeurIPS-W 2021
Cascaded Diffusion Models for High Fidelity Image Generation, arXiv 2021

19 of 75

Reference

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, arXiv 2021
Hierarchical Text-Conditional Image Generation with CLIP Latents, arXiv 2022 (DALLE-v2)
Junan Chen, Xiangyu Chen, Christian Belardi, Khiem Pham, Tutorial on Diffusion Models, 2022

20 of 75

(Probabilistic) generative models

image

(optional)

Easily samplable distribution

[Credits: Tutorial on Diffusion Models]

21 of 75

Existing (probabilistic) generative models

[Credits: What are Diffusion Models?]

Problems:

Training stability and sample diversity

Surrogate loss and encoder model

Specialized architecture for reversible transform

22 of 75

Diffusion models

Diffusion/forward process

Denoising/reverse/generation process

Forward: a Markov chain of diffusion steps to slowly add random noise to data
Reverse: learn to reverse the diffusion process to construct desired data samples from the noise.
Latent variables: same size as the input

[Credits: What are Diffusion Models?]

23 of 75

Diffusion in (non-equilibrium) thermodynamics

Diffusion Process

Denoising Process

(Reverse of Diffusion)

[Credits: Tutorial on Diffusion Models]

24 of 75

Steps to understand diffusion models

Modeling, forward

Modeling, reverse

Training

Extension

25 of 75

Let’s look again

diffuse

denoise

“approximate” denoise

[Credits: Denoising Diffusion Probabilistic Models]

26 of 75

Diffusion/forward process

Markovian steps (add noise gradually)

Conditional joint probability

Scale down + add noise

Hyper-parameters

27 of 75

Diffusion/forward process

28 of 75

Let’s look again

diffuse

denoise

“approximate” denoise

29 of 75

Steps to understand diffusion models

Modeling, forward

Modeling, reverse

Training

Extension

30 of 75

Denoising/reverse process

31 of 75

Denoising/reverse process

32 of 75

Denoising/reverse process

[Credits: Sohl-Dickstein et al., 2015]

33 of 75

Steps to understand diffusion models

Modeling, forward

Modeling, reverse

Training

Extension

35 of 75

How to minimize (1)

36 of 75

How to minimize (2)

Ignore!

Auto-encoder:

Separately learned

37 of 75

How to minimize (3)

38 of 75

Short recap: diffusion models

To learn:

To generate by:

40 of 75

DDPM

diffuse

denoise

“approximate” denoise

42 of 75

Summary

Pros:

DDPM is tractable
DDPM is flexible: can fit arbitrary structures in data (multiple steps)

Cons:

Relies on long Markov Chain diffusion and denoising: very slow to generate
Usually, we need to train T neural networks and apply all of them for generating one image

43 of 75

Why is it the right way to go? (by Harry)

The whole diffusion process essentially goes through T times L layers of neural networks (U-Net).
Every step only does a little, so we do not need to specifically design a generator architecture that directly goes from noise to images.
We have deep supervised signal in the middle, and those supervised signal can be derived easily without additional learning.

46 of 75

Steps to understand diffusion models

Modeling, forward

Modeling, reverse

Training

Extension

47 of 75