1 of 25

DreamFusion: Text-to-3D using 2D Diffusion

2024 Nov 06

🏆: Mingtian Tan

🧑🏻‍⚖️: Haolin Liu

👩🏼‍🚀: Tianle Zhong

🧑🏽‍💻: Xuhui Kang

2 of 25

🏆: Overview: Research Question – Text-to-3D

  • “We instead want to create 3D models that look like good images when rendered from random angles” – referred from authors

  • In summary: without supervision from 3D data, the goal of this paper is to use existing 2D generative models to create desired 3D models based on given text.

3 of 25

🏆: Overview: Motivaiton

1. Success in image-to-image generation: This work was published in Sept 2022, by which time models like Imagen and Stable Diffusion had already made significant progress.

2. Compared to 2D data, 3D data is relatively scarce, making it challenging to directly train a text-to-3D model.

3. The inherent complexity of 3D data makes explicit synthesis challenging; thus, implicit synthesis may be a feasible alternative.

4 of 25

🏆: The DreamFusion Method

DreamFusion = NeRF + Imagen

5 of 25

🏆: NeRF- Neural Radiance Fields

Reconstruct a complete 3D representation of a scene based on multiple images from different viewpoints within the same scene.

Inference: synthesize a 3D image from the specified input viewpoint.

source : NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

6 of 25

🏆: NeRF- Neural Radiance Fields

Training :

Input:

1. Multiple Images from Different Angles

2. Corresponding Camera Parameters

3. Spatial Sampling Points

Output:

1. Color (RGB)

2.Volume Density (σ)

Inference:

Input: New Camera Parameters

Output: Synthesized Image

7 of 25

🏆: The DreamFusion Method

After obtaining renderings generated by NeRF, the Diffusion Model can learn the distribution of these renderings by predicting noise, linking this distribution to the corresponding text.

8 of 25

🏆: The DreamFusion Method

How to optimize NeRF with the loss from Diffusion Model

9 of 25

🏆: The DreamFusion Method

How to optimize NeRF with the loss from Diffusion Model

10 of 25

🏆: The DreamFusion Method

How to optimize NeRF with the loss from Diffusion Model

11 of 25

🏆: The Performance

12 of 25

🧑🏻‍⚖️ Critique: Pipeline Design

The model’s supervised signal only comes from generated 2D images, which may have potential limitations.

  1. Realistic lighting and shadow consistency in 3D space can be problematic, leading to less convincing models.
  2. The generated 3D models might not adhere to physical plausibility, causing unrealistic structures that could be problematic in applications requiring precise physical properties.

13 of 25

🧑🏻‍⚖️Critique: Base Models

The text-to-image diffusion model is Imagen, which generates very low-resolution images (64 × 64), DreamFusion cannot synthesize high-frequency 3D geometric and texture details.

DreamFusion adopts a variant of Mip-NeRF 360 with an explicit shading model for the scene model. It contains large global MLP for volume rendering is both computationally expensive as well as memory intensive, making this approach scale poorly with the increasing resolution of images.

A follow-up work [1] mitigate these two issues by utilizing Stable Diffusion for image generation and Instant-NGP for 3D scene representations.

[1] Lin, Chen-Hsuan, et al. "Magic3d: High-resolution text-to-3d content creation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

14 of 25

🧑🏻‍⚖️Critique: Image Generation

For different perspectives, this paper uses view-dependent prompt to control the various views: “overhead view”, “front view”, “side view”, “back view”.

These rough descriptions may result in inconsistencies between the images rendered by NeRF and those generated by Imagen.

15 of 25

🧑🏻‍⚖️Critique: SDS loss

SDS is not a perfect loss function when applied to image sampling, compared with standard ancestral sampling, it has the following drawbacks.

  1. SDS often produces oversaturated and oversmoothed results compared with ancestral sampling.
  2. 2D image samples produced using SDS tend to lack diversity compared to ancestral sampling, and the 3D results exhibit few differences across random seeds.

16 of 25

👩🏼‍🚀: Paper Summary from Pioneer

You are a pioneer. Your goal is to think how the paper being discussed could be used to accelerate other findings, help in other disciplines (e.g. robotics, science), and be combined with other techniques you have seen to create a novel result worthy of a solid publication.

  • Think of two or three novel applications of the work and present them
  • Tell us how you would go about pursuing these ideas to showcase their efficacy

17 of 25

👩🏼‍🚀: Generate Domain-specific 3D Models with Expert Knowledge

  • Suitable for areas where descriptions are available but missing 3D models
    • DreamFusion can directly generate 3D models with descriptions.
    • The challenge lies in whether they can comply with some domain specifications.
  • Key innovations:
    • Adding domain-specific constraints and physics to the NeRF representation
    • Developing evaluation metrics for scientific accuracy
    • Creating interfaces for expert feedback and refinement
    • Demonstrating practical value in real-world applications

18 of 25

👩🏼‍🚀: Medical Training Simulation Generator

  • Generate accurate 3D anatomical models from medical textbook descriptions and doctor's notes.
  • Approach
    • Fine-tune the 2D diffusion model on medical imaging datasets.
    • Add physics-based tissue properties to the NeRF representation
    • Validate generated models with medical professionals

19 of 25

👩🏼‍🚀: Archaeological Reconstruction Assistant

  • Generate 3D models of historical artifacts and sites from written descriptions and partial evidence
  • Approach:
    • Train on scans of existing artifacts and their descriptions.
    • Incorporate uncertainty visualization into the rendering
    • Allow interactive refinement based on expert feedback

20 of 25

🧑🏽‍💻: Paper Summary from Entrepreneur

You are an entrepreneur. This means you are constantly on the lookout for cool new ideas and to build new products (which will hopefully make profit!). Your goal is to think how the paper being discussed could be used to build a new product – remember the product does not have to be “novel” but it should have high chances of working well and robustly.

  • Think of one or two products derived from the work
  • Tell us how you would go about building a demo to showcase each idea – this is the demo you would show for your seed or round A funding.

21 of 25

🧑🏽‍💻Enterpreneur: Real word application

Product Idea: 3D Asset Generator for E-commerce and Marketing

Description: A web-based tool that allows businesses to generate custom 3D models of products based on simple text descriptions. For example, an e-commerce store could generate photorealistic 3D models of a new sneaker or a custom sofa without needing physical prototypes.

22 of 25

🧑🏽‍💻Enterpreneur: Demo

Feature Highlight: Create a demonstration with a live interface where users input a product description (e.g., "a modern sofa in grey fabric with wooden legs") and receive a high-quality, rotatable 3D model of the product.

Usage Scenarios: Show how the generated 3D models can be used in virtual showrooms or on websites where customers can view products from all angles.

Showcase Realism: Include lighting and shading adjustments, emphasizing the photorealistic aspect and flexibility in viewing angles.

23 of 25

🧑🏽‍💻Enterpreneur: Extension of 3D Asset Generator

Game Development:

  • Inspiration & Prototyping: Generate quick concept visuals based on descriptive prompts (e.g., “a fantasy warrior in golden armor”).
  • Direct Integration: Ready-made, optimized files compatible with Unity, Unreal, and other game engines.

24 of 25

🧑🏽‍💻Enterpreneur: 3D Printing Customization

Customizable Models: Generate unique, personalized designs for practical or decorative items (e.g., “a vase shaped like a seashell”).

Fit & Function Modifications: Specify dimensions or functional elements, ensuring models are print-ready (e.g., “a phone stand for iPhone”).

Rapid Iterations: Quickly test and iterate on design variations before final print.

25 of 25

Thank you for your listening!

Q & A