JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 19

3D Generation from Multi-View Images

Chen Wang

2024/10/07

2 of 19

3D Generation Methods

Method	Speed	Quality	Data
Optimization	> 2 minutes	High	No Data
Feedforward	< 1 second	Low	Multi-View Data
MV diffusion + Reconstruction	~ 10 seconds	High	Multi-View Data
3D Native Diffusion	1 minute	Depending on the data	3D Data

3 of 19

A series of papers…

MVDream
Instant3D
InstantMesh
LGM

4 of 19

Multi-View Image Generation

Optimization-based methods can have multi-face Janus problem due to the bias of diffusion models

Content drifting across different views, e.g. chicken gradually becomes a waffle even with prompt “chicken with waffle”

No information sharing across views!

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

5 of 19

Multi-View Image Generation

Directly generating multi-view images of the same object

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

6 of 19

Multi-View Image Generation

Use 3D attention (cross-attention between different views)
Render synthetic data from Objaverse
Joint training with text-to-image data to perserve diversity

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

7 of 19

Applications

Multi-view SDS: by reparametrization, the loss is equivalent to reconstruction loss

Multi-view dreambooth: personalized diffusion models

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

8 of 19

Results

MVDream: Multi-view Diffusion for 3D Generation, Shi et al.

Multi-view image generation

3D Generation

9 of 19

3D Generation with Reconstruction Models

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, Li et al.

10 of 19

3D Generation with Reconstruction Models

Use DINO to extract features from each input view
Several attention blocks to generate triplanes from features

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, Li et al.

11 of 19

3D Generation with Reconstruction Models

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model, Li et al.

12 of 19

3D Generation with Reconstruction Models

Still follow image -> multi-view -> 3D design
The 3D output is a Flexicubes representation

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, Xu et al.

13 of 19

3D Generation with Reconstruction Models

Flexicubes is a differentiable neural representation that can export mesh instantly
Enable Dual contouring with more flexibility by using MLPs to predict the extraction weights and deformations for each vertex

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, Xu et al.

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization, Shen et al.

14 of 19

3D Generation with Reconstruction Models

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, Xu et al.

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization, Shen et al.

15 of 19

3D Generation with Reconstruction Models

Generates 3D Gaussians with a feed-forward model
Represents 3D Gaussians with multi-view splatter images

Each splatter has 14 channels, including 3 RGB, 3 position, 3 scale, 4 rotation 1 opacity
Training Loss: the rendering of splatter image wrt. GT

LGM: Large Multi-View Gaussian Model, Li et al.

16 of 19

3D Generation with Reconstruction Models

LGM: Large Multi-View Gaussian Model, Li et al.

17 of 19

3D Generation Methods

Method	Speed	Quality	Data
Optimization	> 2 minutes	High	No Data
Feedforward	< 1 second	Low	Multi-View Data
MV diffusion + Reconstruction	~ 10 seconds	High	Multi-View Data
3D Native Diffusion	1 minute	Depending on the data	3D Data

Can we close the gap?

18 of 19

Feedforward 3D Gen with Diffusion Priors

GECO: Generative Image-to-3D within a Second, Wang et al.

19 of 19

Feedforward 3D Gen with Diffusion Priors

GECO: Generative Image-to-3D within a Second, Wang et al.