1 of 1

An overview of the Landscape of 3D Generation with PyTorch

Suvaditya Mukherjee, University of Southern California & Magnopus

suvadity@usc.edu

Evolving Landscape of 3D and 4D Generation

PyTorch 3D Ecosystem - Tools & Frameworks

Balancing Quality, Memory, and Inference Latency

Integrating PyTorch into Creative Workflows

Relevant PyTorch Optimizations for 3D-native stacks

Future Directions

Fig 1. NeRF Render

Fig 2. Mesh Render

Fig 3. 3D Gaussian Splat

  • Current 3D Generation techniques depend on rendering into one of three standards, a NeRF (Neural Radiance Fields), a Mesh/Point Cloud representation, or a 3D Gaussian Splat
  • Works like TRELLIS and SF3D are bringing the Text-to-3D paradigm to the forefront now
  • The field has simultaneously started making the jump into Text-to-4D generation (3D Generation with a time component) and world models to mimic environments
  • PyTorch is a great fit due to well-implemented tensor operations that scale well to GPUs and other accelerators
  • These models are generally per-unit more compute-intensive than LLMs due to their sparse representations
  • PyTorch3D is a FAIR-maintained library for Meta, offering one of the best implementations of PyTorch-centric components
  • First-class support for torch.Tensor
  • The NeRFStudio Project has been one of the most important contributors to the 3D PyTorch ecosystem
  • Libraries like nerfstudio, gsplat, and nerfacc are built on PyTorch and are essential for the community as building blocks to build on
  • Useful for interop between PyTorch and NumPy
  • Essential for file I/O and numerical operations on meshes and point clouds
  • Maintains Kaolin, an important tool for 3D research with features like full-fledged Physics simulations, visualizers that support Jupyter, and differentiable renderers
  • Kaolin Wisp is another PyTorch-native tool that makes it easy to work with Neural Fields
  • Text-to-3D models benefits from quantization to a large extent due to inherently sparse representations, allowing for high-quality generations even after extreme quantization
  • The experiment (below) compares a standard TRELLIS pipeline against a pipeline with Int4 quantization (through torchao). We generate 5 samples and average statistics over it to get final results
  • We see upto ~20% savings in memory at the cost of a higher inference latency due to the dequantization overhead
  • Using torch.compile on the quantized model brought down inference time, but came with a higher VRAM consumption due to kernel caching on the GPU
  • Cheap & free optimization through use of torch.compile
  • Caching latents from the text encoder and VAE can also help repeated calls
  • Quick drop in precision with torch.amp (Automatic Mixed-Precision) can also help in reducing compute requirements
  • Finding a balance between quality and speed by increasing/decreasing number of rays sampled and/or image resolution can be vital
  • Can also make use of Quantization through torchao and other quantization libraries out there such as bitsandbytes and quanto
  • Production-scale inference can be optimized with ExecuTorch and AOT-compilation
  • Advanced optimization strategies would include the use of custom Triton kernels for inference, CUDA-compatible rasterizers (like nvdiffrast) for differentiable rendering in case of Gaussians or polygon renders, or using Occupancy Grids to speed up NeRF-style renders
  • Profiling with tlparse or torch.Profiler is beneficial in finding gaps and bottlenecks
  • Recent announcement of torchax unlocks the ease-of-use for TPUs in 3D Generation, with higher compute availability and a larger ecosystem through JAX
  • OpenUSD standards need more adoption for 3D Generation to help move 3D generation artifacts across tools more easily
  • Rise of World Models with persistent memory for generating worlds on-the-fly will lead to stronger interest in this space
  • Latency of models need to come down with better algorithms and stronger software stack for 3D and Graphics researchers to build on
  • Better support within existing VFX and 3D tools like Blender, Unreal Engine, Unity etc. through plugins
  • Need for better benchmarks of performance with 3D Generation models
  • Biggest win for injecting PyTorch into creative workflows is for 3D Asset Generation in VFX pipelines
  • Material/Geometry matching with differentiable rendering for 3D objects against reference plates is also an important application
  • Monocular Depth Estimation for hard-to-solve shots is useful for Nuke compositor pipelines to simulate relighting, camera vignettes/mattes
  • Scene Reconstruction with partial data is useful for recreating scenes in Unity/Unreal for CGI workflows
  • Mix-and-match textures and generate your own with Blender plugins using PyTorch under the hood to create designs on the fly

Experiments (Colab Notebook)

Virtual Poster