SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
1
Yuseung Lee Kunho Kim Hyunjin Kim Minhyuk Sung
NeurIPS 2023 ML4CD Workshop
Text-to-Image Diffusion Models
Pretrained text-to-image diffusion models are limited to
generating images of certain sizes.
2
Stable Diffusion (Stability AI)
ControlNet
3
ControlNet [Zhang et al., ICCV 2023]
Needs for Arbitrary-Size Generation
There are growing demands for generating arbitrary-size images
in downstream applications such as VR environments.
4
Virtual Reality (VR) Environment1
Expensive Data Acquisition & Training
Training diffusion models for different image sizes would cost substantial time and computing resources.
5
LAION-5B
Expensive Data Acquisition & Training
Training diffusion models for different image sizes would cost substantial time and computing resources.
6
LAION-5B
Goal:
Generate arbitrary-size images with
pretrained text-to-image diffusion models.
Image as Montage
Any arbitrary-size image is a composition of multiple fixed-size images.
7
...
...
“A photo of a mountain range at twilight"
Image as Montage
Fixed-size images can be generated with pretrained models.
8
...
...
“A photo of a mountain range at twilight"
Joint Diffusion [MultiDiffusion, Bar-Tal et al.]
Average noisy latent features in overlapping regions.
9
...
...
Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.
...
Joint Diffusion [MultiDiffusion, Bar-Tal et al.]
Average noisy latent features in the overlapping regions.
10
Average!
...
...
“A photo of a mountain range at twilight"
...
Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.
Joint Diffusion [MultiDiffusion, Bar-Tal et al.]
Crop the full latent to obtain the latent for each window.
11
...
...
“A photo of a mountain range at twilight"
Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.
...
Joint Diffusion [MultiDiffusion, Bar-Tal et al.]
The final output is not coherent.
12
“A photo of a mountain range at twilight"
Bar-Tal et al., MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, ICML 2023.
SyncDiffusion: Synchronized Joint Diffusions
Generate perceptually coherent images in arbitrary sizes.
13
“A photo of a mountain range at twilight"
Background: DDIM [Denoising Diffusion Implicit Models]
14
...
Foreseen output!
Song et al., Denoising Diffusion Implicit Models, ICLR 2021.
Key Idea
15
1 Zhang et al., The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR 2018
SyncDiffusion
16
…
…
…
SyncDiffusion
17
…
…
…
Windows
SyncDiffusion
18
Windows
…
…
…
Anchor window
SyncDiffusion
19
Windows
…
…
…
Anchor window
Compute perceptual similarity.
SyncDiffusion
20
Windows
…
…
…
Anchor window
Gradient descent
SyncDiffusion
21
Windows
…
…
…
Anchor window
Gradient descent
Stable Diffusion + SyncDiffusion
22
MultiDiffusion [Bar-Tal et al.]
“An illustration of a beach in La La Land style” (512 x 3072)
SyncDiffusion (Ours)
Stable Diffusion + SyncDiffusion
23
MultiDiffusion
[Bar-Tal et al.]
SyncDiffusion
(Ours)
“A waterfall”
(2048 x 512)
ControlNet + SyncDiffusion
24
“A digital painting of a city in a faraway planet.”
Line Art (512 x 2048)
“A LEGO city on a sunny day.”
ControlNet + SyncDiffusion
25
“A beautiful city on a sunny day in oil painting.”
Line Art (512 x 2560)
“A futuristic city with neon lights.”
ControlNet + SyncDiffusion
26
“A beautiful city on a sunny day.”
Canny Edge Map (512 x 3072)
“A beautiful city under the sunset.”
ControlNet + SyncDiffusion
27
“A beautiful city on a sunny day in oil painting.”
QR Code (512 x 2048)
“Sci-fi digital painting of a city in a faraway planet.”
Other Applications
28
Bar-Tal et al., MultiDiffusion, ICML 2023.
Tang et al., MVDiffusion, NeurIPS 2023.
360-degree Panorama
Layout-to-Image
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
29
Yuseung Lee Kunho Kim Hyunjin Kim Minhyuk Sung
NeurIPS 2023 ML4CD Workshop