1 of 29

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

1

Yuseung Lee Kunho Kim Hyunjin Kim Minhyuk Sung

NeurIPS 2023 ML4CD Workshop

2 of 29

Text-to-Image Diffusion Models

Pretrained text-to-image diffusion models are limited to

generating images of certain sizes.

2

Stable Diffusion (Stability AI)

3 of 29

ControlNet

3

ControlNet [Zhang et al., ICCV 2023]

4 of 29

Needs for Arbitrary-Size Generation

There are growing demands for generating arbitrary-size images

in downstream applications such as VR environments.

4

Virtual Reality (VR) Environment¹

¹https://brdg.co/vr-room/

5 of 29

Expensive Data Acquisition & Training

Training diffusion models for different image sizes would cost substantial time and computing resources.

5

LAION-5B

6 of 29

Expensive Data Acquisition & Training

Training diffusion models for different image sizes would cost substantial time and computing resources.

6

LAION-5B

Goal:

Generate arbitrary-size images with

pretrained text-to-image diffusion models.

7 of 29

Image as Montage

Any arbitrary-size image is a composition of multiple fixed-size images.

7

...

“A photo of a mountain range at twilight"

8 of 29

Image as Montage

Fixed-size images can be generated with pretrained models.

8

...

“A photo of a mountain range at twilight"

9 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

Average noisy latent features in overlapping regions.

9

...

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

...

10 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

Average noisy latent features in the overlapping regions.

10

Average!

...

“A photo of a mountain range at twilight"

...

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

11 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

Crop the full latent to obtain the latent for each window.

11

...

“A photo of a mountain range at twilight"

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

...

12 of 29

Joint Diffusion [MultiDiffusion, Bar-Tal et al.]

The final output is not coherent.

12

“A photo of a mountain range at twilight"

Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.

13 of 29

SyncDiffusion: Synchronized Joint Diffusions

Generate perceptually coherent images in arbitrary sizes.

13

“A photo of a mountain range at twilight"

14 of 29

Background: DDIM [Denoising Diffusion Implicit Models]

14

...

Foreseen output!

Song et al., Denoising Diffusion Implicit Models, ICLR 2021.

15 of 29

Key Idea

15

¹ Zhang et al., The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR 2018

16 of 29

SyncDiffusion

16

…

17 of 29

SyncDiffusion

17

…

Windows

18 of 29

SyncDiffusion

18

Windows

…

Anchor window

19 of 29

SyncDiffusion

19

Windows

…

Anchor window

Compute perceptual similarity.

20 of 29

SyncDiffusion

20

Windows

…

Anchor window

Gradient descent

21 of 29

SyncDiffusion

21

Windows

…

Anchor window

Gradient descent

22 of 29

Stable Diffusion + SyncDiffusion

22

MultiDiffusion [Bar-Tal et al.]

“An illustration of a beach in La La Land style” (512 x 3072)

SyncDiffusion (Ours)

23 of 29

Stable Diffusion + SyncDiffusion

23

MultiDiffusion

[Bar-Tal et al.]

SyncDiffusion

(Ours)

“A waterfall”

(2048 x 512)

24 of 29

ControlNet + SyncDiffusion

24

“A digital painting of a city in a faraway planet.”

Line Art (512 x 2048)

“A LEGO city on a sunny day.”

25 of 29

ControlNet + SyncDiffusion

25

“A beautiful city on a sunny day in oil painting.”

Line Art (512 x 2560)

“A futuristic city with neon lights.”

26 of 29

ControlNet + SyncDiffusion

26

“A beautiful city on a sunny day.”

Canny Edge Map (512 x 3072)

“A beautiful city under the sunset.”

27 of 29

ControlNet + SyncDiffusion

27

“A beautiful city on a sunny day in oil painting.”

QR Code (512 x 2048)

“Sci-fi digital painting of a city in a faraway planet.”

https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster

28 of 29

Other Applications

28

Bar-Tal et al., MultiDiffusion, ICML 2023.

Tang et al., MVDiffusion, NeurIPS 2023.

360-degree Panorama

Layout-to-Image

29 of 29

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

29

Yuseung Lee Kunho Kim Hyunjin Kim Minhyuk Sung

https://syncdiffusion.github.io/

NeurIPS 2023 ML4CD Workshop