DreamFusion�Text-to-3D using 2D diffusion
POOLE, B., JAIN, A., BARRON, J. T., & MILDENHALL, B. (2022)
Key Contributions
Example generations
Image Diffusion Refresher
Forward process
Reverse process
Image Diffusion Refresher
Forward process
Reverse process
At each step
Instead of predicting denoised images, diffusion models predict the noise content
At each step
Conditioning
They can be conditioned on things like text, other images, or other embeddings
Imagen
Text Embeddings
Imagen
Saharia, C. et al. (2022) ‘Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding’, arXiv [cs.CV]. Available at: http://arxiv.org/abs/2205.11487.
The DreamFusion Model
The rendering
The rendering
Score Distillation Sampling Loss (SDS)
Score Distillation Sampling Loss (SDS)
“Overhead view/ front view/ side view”
+
Score Distillation Sampling Loss (SDS)
Score Distillation Sampling Loss (SDS)
Score Distillation Sampling Loss (SDS)
Results
Results
The two other models were trained using CLIP, so using CLIP here is not the best evaluation metric
Common Failure
Prompt: a DSLR photo of a toy cow
Prolific Dreamer�High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
WANG, Z., LU, C., WANG, Y., BAO, F., LI, C., SU, H., & ZHU, J. (2023)
Key Contribution
Variational Score Distillation
In practice
Use several NeRF instead of 1. (up to 4 in practice)
In Practice
In Practice
In Practice
VDS
SDS
How Does it Compare
How Does it Compare
How Does it Compare
Question time