DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model given Sparse Views
Paul Yoo Jiaxian Guo Xin Zhang Yutaka Matsuo Shixiang Shane Gu
The University of Tokyo, Japan
Problem Setting
Given sparse observations, how can we imagine and synthesize the unobserved?
Normally needs few-hours per-object optimization
?
Limitations of Prior Approaches
?
Yu, Alex, et al. "pixelnerf: Neural radiance fields from one or few images." CVPR 2021.
Zhou, Zhizhuo, and Shubham Tulsiani. "Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction." CVPR 2023.
Motivation
LAION-5B
Can we leverage a frozen, internet-scale, pre-trained image diffusion model as a strong 2D prior?
?
Schuhmann, Christoph, et al. "LAION-5B: An open large-scale dataset for training next generation image-text models." Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
Challenges
“backview of a chair”
“sideview of a teddybear”
DreamBooth
Ruiz, Nataniel, et al. "Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation." CVPR 2023.
How did we approach?
Color Reconstruction Loss
Noise Estimation Loss
NVS of CO3D Hydrant Scenes
NVS
Context View
Novel View Synthesis Results on CO3D
NVS Comparisons for Open-set Categories from CO3D
| FID | LPIPS | PSNR |
SF | 212.9 | 0.33 | 18.79 |
Ours | 122.2 | 0.24 | 20.19 |
Average over 10 open-set categories
| FID | LPIPS | PSNR |
SF | 172.6 | 0.29 | 19.85 |
Ours | 81.8 | 0.21 | 22.03 |
Average over 10 training-set categories
SF denotes SparseFusion (Zhou and Tulsiani. CVPR 2023)
More NVS Results for Open-set Categories from CO3D
Style Editing Capability via Text Guidance
Thanks