1 of 142

Lecture 14

Image Synthesis

6.8300/1 Advances in Computer Vision

Spring 2024

Sara Beery, Kaiming He, Vincent Sitzmann, Mina Konaković Luković

2 of 142

14. Image Synthesis

Image synthesis

Variational Autoencoders
Generative Adversarial Networks

Structured prediction

Image-to-image GANs

Domain mapping

3 of 142

Announcements

Pset5 due Tuesday, 04/02
Pset6 out Thursday, 04/04
Project proposal due Thursday, 04/04

4 of 142

Analysis

“Duck”

image x

label y

c

5 of 142

Analysis

“A large duck standing by the river”

image x

caption y

c

Image Captioner

6 of 142

Analysis

“positive”

sentence x

sentiment y

“A statuesque duck gazing gracefully over the water”

c

Sentiment Classifier

7 of 142

Analysis

“Duck”

label y

image x

c

8 of 142

Synthesis

“Duck”

label y

image x

c

9 of 142

Synthesis

Generator

image x

“Fish”

label y

10 of 142

Synthesis

Photo

User sketch

c

Translator

11 of 142

Synthesis

Photo

User sketch

c

12 of 142

Image synthesis via generative modeling

In vision, this is usually what we are interested in!

Model of high-dimensional structured data

X is high-dimensional!

13 of 142

z

Deep nets are data transformers

Deep nets transform datapoints, layer by layer
Each layer is a different representation of the data

Embedding

Data

14 of 142

z

Embedding

Data

Deep nets are data transformers

Deep nets transform datapoints, layer by layer
Each layer is a different representation of the data

15 of 142

Generative modeling vs Representation learning

Representation learning:

mapping data to abstract representations

(analysis)

z

Embedding

Data

Generative modeling

Representation learning

Generative modeling:

mapping abstract representations to data (synthesis)

16 of 142

Image synthesis
Representation learning
Data translation

What can you do with generative models?

17 of 142

Image synthesis

[Images: https://ganbreeder.app/]

Image synthesis
Representation learning
Data translation

18 of 142

Procedural graphics

[Anders Scheil]

19 of 142

20 of 142

Image synthesis from “noise”

Generator

21 of 142

Learning a generative model

[figs modified from: http://introtodeeplearning.com/materials/2019_6S191_L4.pdf]

22 of 142

[figs modified from: http://introtodeeplearning.com/materials/2019_6S191_L4.pdf]

Learning a density model

23 of 142

Case study #1: Fitting a Gaussian to data

fig from [Goodfellow, 2016]

Max likelihood objective

Considering only Gaussian fits

24 of 142

Case study #1: Fitting a Gaussian to data

“max likelihood”

25 of 142

Case study #2: learning a deep generative model

SGD

Deep net

Usually max likelihood

26 of 142

SGD

Deep net

Usually max likelihood

Models that provide a sampler but no density are called implicit generative models

Case study #2: learning a deep generative model

27 of 142

Deep generative models are distribution transformers

Prior distribution

Target distribution

28 of 142

Gaussian noise

Deep generative models are distribution transformers

Synthesized image

29 of 142

Deep generative models are distribution transformers

Gaussian noise

Synthesized image

30 of 142

Generative Adversarial Networks (GANs)

Gaussian noise

Synthesized image

31 of 142

Generator

[Goodfellow et al., 2014]

G tries to synthesize fake images that fool D

D tries to identify the fakes

Discriminator

real or fake?

32 of 142

[Goodfellow et al., 2014]

fake (0.9)

real (0.1)

33 of 142

G tries to synthesize fake images that fool D:

real or fake?

[Goodfellow et al., 2014]

34 of 142

G tries to synthesize fake images that fool the best D:

real or fake?

[Goodfellow et al., 2014]

35 of 142

Training: iterate between training D and G with backprop.
Global optimum when G reproduces data distribution.

Training

G tries to synthesize fake images that fool D

D tries to identify the fakes

real or fake?

[Goodfellow et al., 2014]

36 of 142

GANs are implicit generative models

“generative model” of the data x

Noise distribution

Samples from a perfectly optimized, sufficiently expressive GAN are samples from the data distribution

GAN

Data distribution

37 of 142

Proof

38 of 142

Proof

is the unique global minimizer of the GAN objective.

39 of 142

Samples from BigGAN

[Brock et al. 2018]

40 of 142

Generative Adversarial Network

Deep nets G and D

Alternating SGD on G and D

41 of 142

Latent space

(Gaussian)

Data space

(Natural image manifold)

[BigGAN, Brock et al. 2018]

42 of 142

Generative adversarial networks are representation learners

Images generated by walking along two latent dimensions of BigGAN

[BigGAN, Brock et al. 2018]

43 of 142

Generative models organize the manifold of natural images

image space

latent space

44 of 142

Representation learning

Image synthesis
Representation learning
Data translation

z

Embedding

Data

Generative modeling

Representation learning

45 of 142

Autoencoder —> Generative model

46 of 142

Variational Autoencoders (VAEs)

Prior distribution

Target distribution

[Kingma & Welling, 2014; Rezende, Mohamed, Wierstra 2014]

47 of 142

Mixture of Gaussians

Target distribution

48 of 142

Variational Autoencoders (VAEs)

Target distribution

Density model:

Sampling:

[Kingma & Welling, 2014; Rezende, Mohamed, Wierstra 2014]

49 of 142

Variational Autoencoder (VAE)

50 of 142