1 of 1

An overview of the Landscape of 3D Generation with PyTorch

Suvaditya Mukherjee, University of Southern California & Magnopus

suvadity@usc.edu

Evolving Landscape of 3D and 4D Generation

PyTorch 3D Ecosystem - Tools & Frameworks

Balancing Quality, Memory, and Inference Latency

Integrating PyTorch into Creative Workflows

Relevant PyTorch Optimizations for 3D-native stacks

Future Directions

Fig 1. NeRF Render

Fig 2. Mesh Render

Fig 3. 3D Gaussian Splat

Current 3D Generation techniques depend on rendering into one of three standards, a NeRF (Neural Radiance Fields), a Mesh/Point Cloud representation, or a 3D Gaussian Splat
Works like TRELLIS and SF3D are bringing the Text-to-3D paradigm to the forefront now
The field has simultaneously started making the jump into Text-to-4D generation (3D Generation with a time component) and world models to mimic environments
PyTorch is a great fit due to well-implemented tensor operations that scale well to GPUs and other accelerators
These models are generally per-unit more compute-intensive than LLMs due to their sparse representations

PyTorch3D is a FAIR-maintained library for Meta, offering one of the best implementations of PyTorch-centric components
First-class support for torch.Tensor

The NeRFStudio Project has been one of the most important contributors to the 3D PyTorch ecosystem
Libraries like nerfstudio, gsplat, and nerfacc are built on PyTorch and are essential for the community as building blocks to build on

Maintains Kaolin, an important tool for 3D research with features like full-fledged Physics simulations, visualizers that support Jupyter, and differentiable renderers
Kaolin Wisp is another PyTorch-native tool that makes it easy to work with Neural Fields

Text-to-3D models benefits from quantization to a large extent due to inherently sparse representations, allowing for high-quality generations even after extreme quantization
The experiment (below) compares a standard TRELLIS pipeline against a pipeline with Int4 quantization (through torchao). We generate 5 samples and average statistics over it to get final results
We see upto ~20% savings in memory at the cost of a higher inference latency due to the dequantization overhead
Using torch.compile on the quantized model brought down inference time, but came with a higher VRAM consumption due to kernel caching on the GPU

Cheap & free optimization through use of torch.compile
Caching latents from the text encoder and VAE can also help repeated calls
Quick drop in precision with torch.amp (Automatic Mixed-Precision) can also help in reducing compute requirements
Finding a balance between quality and speed by increasing/decreasing number of rays sampled and/or image resolution can be vital
Can also make use of Quantization through torchao and other quantization libraries out there such as bitsandbytes and quanto
Production-scale inference can be optimized with ExecuTorch and AOT-compilation
Advanced optimization strategies would include the use of custom Triton kernels for inference, CUDA-compatible rasterizers (like nvdiffrast) for differentiable rendering in case of Gaussians or polygon renders, or using Occupancy Grids to speed up NeRF-style renders
Profiling with tlparse or torch.Profiler is beneficial in finding gaps and bottlenecks

Recent announcement of torchax unlocks the ease-of-use for TPUs in 3D Generation, with higher compute availability and a larger ecosystem through JAX
OpenUSD standards need more adoption for 3D Generation to help move 3D generation artifacts across tools more easily
Rise of World Models with persistent memory for generating worlds on-the-fly will lead to stronger interest in this space
Latency of models need to come down with better algorithms and stronger software stack for 3D and Graphics researchers to build on
Better support within existing VFX and 3D tools like Blender, Unreal Engine, Unity etc. through plugins
Need for better benchmarks of performance with 3D Generation models

Biggest win for injecting PyTorch into creative workflows is for 3D Asset Generation in VFX pipelines
Material/Geometry matching with differentiable rendering for 3D objects against reference plates is also an important application
Monocular Depth Estimation for hard-to-solve shots is useful for Nuke compositor pipelines to simulate relighting, camera vignettes/mattes
Scene Reconstruction with partial data is useful for recreating scenes in Unity/Unreal for CGI workflows
Mix-and-match textures and generate your own with Blender plugins using PyTorch under the hood to create designs on the fly

Experiments (Colab Notebook)

Virtual Poster