1 of 29

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

1

Yuseung Lee Kunho Kim Hyunjin Kim Minhyuk Sung

NeurIPS 2023 ML4CD Workshop

2 of 29

Text-to-Image Diffusion Models

Pretrained text-to-image diffusion models are limited to

generating images of certain sizes.

2

Stable Diffusion (Stability AI)

3 of 29

ControlNet

3

ControlNet [Zhang et al., ICCV 2023]

4 of 29

Needs for Arbitrary-Size Generation

There are growing demands for generating arbitrary-size images

in downstream applications such as VR environments.

4

Virtual Reality (VR) Environment1

5 of 29

Expensive Data Acquisition & Training

Training diffusion models for different image sizes would cost substantial time and computing resources.

5

LAION-5B

6 of 29

Expensive Data Acquisition & Training

Training diffusion models for different image sizes would cost substantial time and computing resources.

6

LAION-5B

Goal:

Generate arbitrary-size images with

pretrained text-to-image diffusion models.

7 of 29

Image as Montage

Any arbitrary-size image is a composition of multiple fixed-size images.

7

...

...

“A photo of a mountain range at twilight"

8 of 29

Image as Montage

Fixed-size images can be generated with pretrained models.

8

...

...

“A photo of a mountain range at twilight"

9 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

Average noisy latent features in overlapping regions.

9

 

 

...

...

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

 

...

10 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

Average noisy latent features in the overlapping regions.

10

Average!

...

...

“A photo of a mountain range at twilight"

...

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

 

 

 

 

11 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

Crop the full latent to obtain the latent for each window.

11

...

...

“A photo of a mountain range at twilight"

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

 

 

 

 

...

12 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

The final output is not coherent.

12

“A photo of a mountain range at twilight"

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

13 of 29

SyncDiffusion: Synchronized Joint Diffusions

Generate perceptually coherent images in arbitrary sizes.

13

“A photo of a mountain range at twilight"

14 of 29

Background: DDIM [Denoising Diffusion Implicit Models]

  •  

14

 

 

 

...

Foreseen output!

 

 

Song et al., Denoising Diffusion Implicit Models, ICLR 2021.

 

15 of 29

Key Idea

  •  

15

 

 

 

 

 

 

 

 

1 Zhang et al., The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR 2018

16 of 29

SyncDiffusion

16

17 of 29

SyncDiffusion

17

 

Windows

18 of 29

SyncDiffusion

18

 

Windows

 

 

Anchor window

19 of 29

SyncDiffusion

19

 

Windows

 

 

Anchor window

 

 

 

 

Compute perceptual similarity.

20 of 29

SyncDiffusion

20

 

Windows

 

 

Anchor window

 

 

 

 

Gradient descent

 

21 of 29

SyncDiffusion

21

 

Windows

 

 

Anchor window

 

 

 

 

Gradient descent

 

22 of 29

Stable Diffusion + SyncDiffusion

22

MultiDiffusion [Bar-Tal et al.]

“An illustration of a beach in La La Land style” (512 x 3072)

SyncDiffusion (Ours)

23 of 29

Stable Diffusion + SyncDiffusion

23

MultiDiffusion

[Bar-Tal et al.]

SyncDiffusion

(Ours)

“A waterfall”

(2048 x 512)

24 of 29

ControlNet + SyncDiffusion

24

“A digital painting of a city in a faraway planet.”

Line Art (512 x 2048)

“A LEGO city on a sunny day.”

25 of 29

ControlNet + SyncDiffusion

25

“A beautiful city on a sunny day in oil painting.”

Line Art (512 x 2560)

“A futuristic city with neon lights.”

26 of 29

ControlNet + SyncDiffusion

26

“A beautiful city on a sunny day.”

Canny Edge Map (512 x 3072)

“A beautiful city under the sunset.”

27 of 29

ControlNet + SyncDiffusion

27

“A beautiful city on a sunny day in oil painting.”

QR Code (512 x 2048)

“Sci-fi digital painting of a city in a faraway planet.”

28 of 29

Other Applications

28

Bar-Tal et al., MultiDiffusion, ICML 2023.

Tang et al., MVDiffusion, NeurIPS 2023.

360-degree Panorama

Layout-to-Image

29 of 29

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

29

Yuseung Lee Kunho Kim Hyunjin Kim Minhyuk Sung

NeurIPS 2023 ML4CD Workshop