1 of 12

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model given Sparse Views

Paul Yoo Jiaxian Guo Xin Zhang Yutaka Matsuo Shixiang Shane Gu

The University of Tokyo, Japan

https://sites.google.com/view/dreamsparse/home

2 of 12

Problem Setting

Given sparse observations, how can we imagine and synthesize the unobserved?

Normally needs few-hours per-object optimization

?

3 of 12

Limitations of Prior Approaches

  • Re-projection based approaches, such as PixelNeRF, renders a blurry image given a query view remote from the context views
  • While, a similar line of work, SparseFusion hallucinates via a 2D diffusion model, this module is trained from scratch and thus exhibits limited generative capabilities

?

Yu, Alex, et al. "pixelnerf: Neural radiance fields from one or few images." CVPR 2021.

Zhou, Zhizhuo, and Shubham Tulsiani. "Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction." CVPR 2023.

4 of 12

Motivation

LAION-5B

Can we leverage a frozen, internet-scale, pre-trained image diffusion model as a strong 2D prior?

?

Schuhmann, Christoph, et al. "LAION-5B: An open large-scale dataset for training next generation image-text models." Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

5 of 12

Challenges

  • 2D diffusion models are inherently not 3D-aware

“backview of a chair”

“sideview of a teddybear”

DreamBooth

  • How can we preserve 3D-consistent object identity without per-object fine-tuning during inference?

Ruiz, Nataniel, et al. "Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation." CVPR 2023.

6 of 12

How did we approach?

Color Reconstruction Loss

Noise Estimation Loss

7 of 12

NVS of CO3D Hydrant Scenes

NVS

Context View

8 of 12

Novel View Synthesis Results on CO3D

9 of 12

NVS Comparisons for Open-set Categories from CO3D

FID

LPIPS

PSNR

SF

212.9

0.33

18.79

Ours

122.2

0.24

20.19

Average over 10 open-set categories

FID

LPIPS

PSNR

SF

172.6

0.29

19.85

Ours

81.8

0.21

22.03

Average over 10 training-set categories

SF denotes SparseFusion (Zhou and Tulsiani. CVPR 2023)

10 of 12

More NVS Results for Open-set Categories from CO3D

11 of 12

Style Editing Capability via Text Guidance

12 of 12

Thanks