JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 12

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model given Sparse Views

Paul Yoo Jiaxian Guo Xin Zhang Yutaka Matsuo Shixiang Shane Gu

The University of Tokyo, Japan

https://sites.google.com/view/dreamsparse/home

2 of 12

Problem Setting

Given sparse observations, how can we imagine and synthesize the unobserved?

Normally needs few-hours per-object optimization

3 of 12

Limitations of Prior Approaches

Re-projection based approaches, such as PixelNeRF, renders a blurry image given a query view remote from the context views
While, a similar line of work, SparseFusion hallucinates via a 2D diffusion model, this module is trained from scratch and thus exhibits limited generative capabilities

Yu, Alex, et al. "pixelnerf: Neural radiance fields from one or few images." CVPR 2021.

Zhou, Zhizhuo, and Shubham Tulsiani. "Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction." CVPR 2023.

4 of 12

Motivation

LAION-5B

Can we leverage a frozen, internet-scale, pre-trained image diffusion model as a strong 2D prior?

Schuhmann, Christoph, et al. "LAION-5B: An open large-scale dataset for training next generation image-text models." Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

5 of 12

Challenges

2D diffusion models are inherently not 3D-aware

“backview of a chair”

“sideview of a teddybear”

DreamBooth

How can we preserve 3D-consistent object identity without per-object fine-tuning during inference?

Ruiz, Nataniel, et al. "Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation." CVPR 2023.

6 of 12

How did we approach?

Color Reconstruction Loss

Noise Estimation Loss

7 of 12

NVS of CO3D Hydrant Scenes

NVS

Context View

8 of 12

Novel View Synthesis Results on CO3D

9 of 12

NVS Comparisons for Open-set Categories from CO3D

	FID	LPIPS	PSNR
SF	212.9	0.33	18.79
Ours	122.2	0.24	20.19

Average over 10 open-set categories

	FID	LPIPS	PSNR
SF	172.6	0.29	19.85
Ours	81.8	0.21	22.03

Average over 10 training-set categories

SF denotes SparseFusion (Zhou and Tulsiani. CVPR 2023)

10 of 12

More NVS Results for Open-set Categories from CO3D

11 of 12

Style Editing Capability via Text Guidance

12 of 12

Thanks