1 of 19

3D Generation from Multi-View Images

Chen Wang

2024/10/07

2 of 19

3D Generation Methods

Method

Speed

Quality

Data

Optimization

> 2 minutes

High

No Data

Feedforward

< 1 second

Low

Multi-View Data

MV diffusion + Reconstruction

~ 10 seconds

High

Multi-View Data

3D Native Diffusion

1 minute

Depending on the data

3D Data

3 of 19

A series of papers…

  • MVDream
  • Instant3D
  • InstantMesh
  • LGM

4 of 19

Multi-View Image Generation

  • Optimization-based methods can have multi-face Janus problem due to the bias of diffusion models
  • Content drifting across different views, e.g. chicken gradually becomes a waffle even with prompt “chicken with waffle”

No information sharing across views!

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

5 of 19

Multi-View Image Generation

  • Directly generating multi-view images of the same object

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

6 of 19

Multi-View Image Generation

  • Use 3D attention (cross-attention between different views)
  • Render synthetic data from Objaverse
  • Joint training with text-to-image data to perserve diversity

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

7 of 19

Applications

  • Multi-view SDS: by reparametrization, the loss is equivalent to reconstruction loss

  • Multi-view dreambooth: personalized diffusion models

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

8 of 19

Results

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

Multi-view image generation

3D Generation

9 of 19

3D Generation with Reconstruction Models

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, Li et al.

10 of 19

3D Generation with Reconstruction Models

  • Use DINO to extract features from each input view
  • Several attention blocks to generate triplanes from features

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, Li et al.

11 of 19

3D Generation with Reconstruction Models

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, Li et al.

12 of 19

3D Generation with Reconstruction Models

  • Still follow image -> multi-view -> 3D design
  • The 3D output is a Flexicubes representation

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, Xu et al.

13 of 19

3D Generation with Reconstruction Models

  • Flexicubes is a differentiable neural representation that can export mesh instantly
  • Enable Dual contouring with more flexibility by using MLPs to predict the extraction weights and deformations for each vertex

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, Xu et al.

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization, Shen et al.

14 of 19

3D Generation with Reconstruction Models

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, Xu et al.

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization, Shen et al.

15 of 19

3D Generation with Reconstruction Models

  • Generates 3D Gaussians with a feed-forward model
  • Represents 3D Gaussians with multi-view splatter images
    • Each splatter has 14 channels, including 3 RGB, 3 position, 3 scale, 4 rotation 1 opacity
    • Training Loss: the rendering of splatter image wrt. GT

LGM: Large Multi-View Gaussian Model, Li et al.

16 of 19

3D Generation with Reconstruction Models

  • LGM: Large Multi-View Gaussian Model, Li et al.

17 of 19

3D Generation Methods

Method

Speed

Quality

Data

Optimization

> 2 minutes

High

No Data

Feedforward

< 1 second

Low

Multi-View Data

MV diffusion + Reconstruction

~ 10 seconds

High

Multi-View Data

3D Native Diffusion

1 minute

Depending on the data

3D Data

Can we close the gap?

18 of 19

Feedforward 3D Gen with Diffusion Priors

GECO: Generative Image-to-3D within a Second, Wang et al.

19 of 19

Feedforward 3D Gen with Diffusion Priors

GECO: Generative Image-to-3D within a Second, Wang et al.