DreamFusion: Text-to-3D using 2D Diffusion
2024 Nov 06
🏆: Mingtian Tan
🧑🏻⚖️: Haolin Liu
👩🏼🚀: Tianle Zhong
🧑🏽💻: Xuhui Kang
🏆: Overview: Research Question – Text-to-3D
🏆: Overview: Motivaiton
1. Success in image-to-image generation: This work was published in Sept 2022, by which time models like Imagen and Stable Diffusion had already made significant progress.
2. Compared to 2D data, 3D data is relatively scarce, making it challenging to directly train a text-to-3D model.
3. The inherent complexity of 3D data makes explicit synthesis challenging; thus, implicit synthesis may be a feasible alternative.
🏆: The DreamFusion Method
DreamFusion = NeRF + Imagen
🏆: NeRF- Neural Radiance Fields
Reconstruct a complete 3D representation of a scene based on multiple images from different viewpoints within the same scene.
Inference: synthesize a 3D image from the specified input viewpoint.
source : NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
🏆: NeRF- Neural Radiance Fields
Training :
Input:
1. Multiple Images from Different Angles
2. Corresponding Camera Parameters
3. Spatial Sampling Points
Output:
1. Color (RGB)
2.Volume Density (σ)
Inference:
Input: New Camera Parameters
Output: Synthesized Image
🏆: The DreamFusion Method
After obtaining renderings generated by NeRF, the Diffusion Model can learn the distribution of these renderings by predicting noise, linking this distribution to the corresponding text.
🏆: The DreamFusion Method
How to optimize NeRF with the loss from Diffusion Model
🏆: The DreamFusion Method
How to optimize NeRF with the loss from Diffusion Model
🏆: The DreamFusion Method
How to optimize NeRF with the loss from Diffusion Model
🏆: The Performance
🧑🏻⚖️ Critique: Pipeline Design
The model’s supervised signal only comes from generated 2D images, which may have potential limitations.
🧑🏻⚖️Critique: Base Models
The text-to-image diffusion model is Imagen, which generates very low-resolution images (64 × 64), DreamFusion cannot synthesize high-frequency 3D geometric and texture details.
DreamFusion adopts a variant of Mip-NeRF 360 with an explicit shading model for the scene model. It contains large global MLP for volume rendering is both computationally expensive as well as memory intensive, making this approach scale poorly with the increasing resolution of images.
A follow-up work [1] mitigate these two issues by utilizing Stable Diffusion for image generation and Instant-NGP for 3D scene representations.
[1] Lin, Chen-Hsuan, et al. "Magic3d: High-resolution text-to-3d content creation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
🧑🏻⚖️Critique: Image Generation
For different perspectives, this paper uses view-dependent prompt to control the various views: “overhead view”, “front view”, “side view”, “back view”.
These rough descriptions may result in inconsistencies between the images rendered by NeRF and those generated by Imagen.
🧑🏻⚖️Critique: SDS loss
SDS is not a perfect loss function when applied to image sampling, compared with standard ancestral sampling, it has the following drawbacks.
👩🏼🚀: Paper Summary from Pioneer
You are a pioneer. Your goal is to think how the paper being discussed could be used to accelerate other findings, help in other disciplines (e.g. robotics, science), and be combined with other techniques you have seen to create a novel result worthy of a solid publication.
👩🏼🚀: Generate Domain-specific 3D Models with Expert Knowledge
👩🏼🚀: Medical Training Simulation Generator
👩🏼🚀: Archaeological Reconstruction Assistant
🧑🏽💻: Paper Summary from Entrepreneur
You are an entrepreneur. This means you are constantly on the lookout for cool new ideas and to build new products (which will hopefully make profit!). Your goal is to think how the paper being discussed could be used to build a new product – remember the product does not have to be “novel” but it should have high chances of working well and robustly.
🧑🏽💻Enterpreneur: Real word application
Product Idea: 3D Asset Generator for E-commerce and Marketing
Description: A web-based tool that allows businesses to generate custom 3D models of products based on simple text descriptions. For example, an e-commerce store could generate photorealistic 3D models of a new sneaker or a custom sofa without needing physical prototypes.
🧑🏽💻Enterpreneur: Demo
Feature Highlight: Create a demonstration with a live interface where users input a product description (e.g., "a modern sofa in grey fabric with wooden legs") and receive a high-quality, rotatable 3D model of the product.
Usage Scenarios: Show how the generated 3D models can be used in virtual showrooms or on websites where customers can view products from all angles.
Showcase Realism: Include lighting and shading adjustments, emphasizing the photorealistic aspect and flexibility in viewing angles.
🧑🏽💻Enterpreneur: Extension of 3D Asset Generator
Game Development:
🧑🏽💻Enterpreneur: 3D Printing Customization
Customizable Models: Generate unique, personalized designs for practical or decorative items (e.g., “a vase shaped like a seashell”).
Fit & Function Modifications: Specify dimensions or functional elements, ensuring models are print-ready (e.g., “a phone stand for iPhone”).
Rapid Iterations: Quickly test and iterate on design variations before final print.
Thank you for your listening!
Q & A