1 of 28

2 of 28

Overview - 3D content creation (with text)

  • Recent work – DreamFusion, Magic3D, Score Jacobian Chaining… – synthesize high-quality 3D objects, but require a lengthy per-prompt optimization (15+ min).
  • Currently, creating 3D content is valuable, but difficult
  • Generating high quality assets from text descriptions makes creating usable 3D content easier

a squirrel wearing an elegant ballgown playing the saxophone

a brightly colored mushroom growing on a log

DreamFusion

Magic3D

3 of 28

Overview - the issue

  • Users repeatedly iterate between engineering prompt (and other parameters) then rendering results
  • Waiting 15+ min on each design iteration creates a sporadic and time-consuming process.

4 of 28

Overview - our solution

  • We solve this by optimizing a single, amortized model on many prompts.
  • We render unseen prompts in < 1 sec. Prior methods took 15+ minutes.

5 of 28

Overview - benefits

  • Benefits of:
    • Generalization
    • Interpolating between prompts
    • Reducing training time
    • Amortizing over other types of info

6 of 28

Using our method: ATT3D

  • Want to generate 3D objects (from text)
  • Choose 3D object representation – here Neural Radiance Fields (NeRFs)
  • We use spatially oriented features - here an instant-NGP
  • Want separate output for each text-prompt
  • So, “modulate” the parameters with the text
  • We use a hypernetwork on text-embedding

7 of 28

Training with our method

  • Use SDS loss from DreamFusion (on multiple prompts):

8 of 28

Benefit: Reduce compute time to train on a set of prompts

  • Amortization allows us to train a single model, producing various objects.
  • Single-prompt training, used in DreamFusion, trains a separate model for each prompt.
  • Let’s compare the results:

Amortized Training

Single-prompt Training

9 of 28

Benefit: Reduce compute time to train on a set of prompts

  • Amortization (blue) allows higher quality than single-prompt training (red) for almost all compute budgets

10 of 28

Benefit: Reduce compute time to train on a set of prompts

  • Amortization (blue) allows higher quality than single-prompt training (red) for almost all compute budgets

11 of 28

Benefit: Reduce compute time to train on a set of prompts

  • We scale to the extended DreamFusion 411 prompt set with identical model size and compute budget

  • We show examples where amortization re-uses components, allowing for compute savings

12 of 28

Do we have any generalization?

13 of 28

Benefit: Generalize to new prompts

  • We generalize to unseen testing prompts - no additional training - shown along the diagonal in red
  • No testing protocol for per-prompt optimization, so show initialization to align compute budget

Amortized Optimization

Per-prompt Optimization

14 of 28

Benefit: Generalize to new prompts

  • Amortized training (blue) achieves higher training and testing qualities than single-prompt training (red), for almost all compute budgets
  • No zero-shot testing protocol for single-prompt training, so show random performance
  • Gains grow for compositional (middle) and larger (right) prompt sets.
  • Generalization gap small with 50% prompts, and unseen testing better for 12.5% than seen per-prompt

15 of 28

Benefit: Generalize to new prompts

  • A single model trained on the animal prompts generalizes to unseen prompts without optimization

Amortized 50% split, unseen prompts

Amortized 12.5% split, unseen prompts

Per-prompt optimization

16 of 28

Possible Benefit: Consistent output

  • Amortized optimization may create objects matching prompts more consistently

“… holding a blue balloon”

Amortized optimization

Per-prompt optimization

17 of 28

Benefit: Finetune on prompts

  • Amortized optimization recovers the correct balloon, unlike per-prompt.
  • We can finetune this with Magic3D’s second optimization stage

Per-prompt

Amortized

Amortized + Magic3D

Various strategies on “a pig wearing medieval armor holding a blue balloon”

18 of 28

Benefit: Finetune on prompts

  • Or, just use amortization for an initialization to continue finetuning unseen prompts
  • Outperforms per-prompt strategy of optimizing from random initialization

19 of 28

Benefit: Interpolate between prompts

  • Our method allows interpolations, unlike single-prompt training.
  • We synthesize a continuum of novel assets by interpolating embeddings.
  • Here, we train on 3 prompts and zero-shot generalize to interpolants.

20 of 28

Benefit: Interpolate between prompts

  • Some prompts reasonable at all interpolants, but some could be improved.
  • Can we augment training to also amortize over interpolants?

21 of 28

Benefit: Amortize over other information

  • We amortize over various training methods to produce different types of results.
  • Using no training interpolation can naively dissolve between prompts.

No Train Interpolation

Guidance Interpolation

22 of 28

Benefit: Prompt interpolation for novel assets & animations

“... in the fall with dying leaves”

“... full of leaves in the summer”

“... with flowering cherry blossoms ”

“a baby dragon”

“a green dragon”

“a red convertible car with the top down”

“a completely destroyed car”

“... gnarly, old, leafless with many branches”

“a jagged rock”

“a mossy rock”

“...cottage with thatched roof”

“...house in tudor style”

“...dress made of fruit…”

“...dress made of garbage bags…”

23 of 28

Future Directions & Limitations

24 of 28

Conclusion

  • We presented a method for amortized optimization of text-to-3D models: ATT3D
  • Our method trains a single, amortized model on various text-prompts.
  • Benefits of:
    • Real-time asset generation via generalizing to prompts
    • User-guided generation via interpolating between prompts & amortizing over other info
    • Cost savings via reducing training time
  • A promising avenue towards general and fast text-to-3D generation

25 of 28

Jonathan Lorraine

Kevin Xie

Xiaohui Zeng

Chen-Hsuan Lin

Towaki Takikawa

Tsung-Yi Lin

Ming-Yu Liu

Sanja Fidler

James Lucas

Nicholas Sharp

26 of 28

Citations

  • Poole, Ben, et al. "Dreamfusion: Text-to-3d using 2d diffusion." arXiv preprint arXiv:2209.14988 (2022).
  • Lin, Chen-Hsuan, et al. "Magic3D: High-Resolution Text-to-3D Content Creation." arXiv preprint arXiv:2211.10440 (2022).
  • Wang, Haochen, et al. "Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation." arXiv preprint arXiv:2212.00774 (2022).

27 of 28

Extra slides

28 of 28