1 of 97

Differentiable Rendering

Oct 17

2 of 97

Lecturer

Wei-Cheng Huang

3 of 97

Modular Primitives for High-Performance Differentiable Rendering (NVdiffrast)

Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, Timo Aila

ACM Transactions on Graphics 39(6) (proc. SIGGRAPH Asia 2020)

Lecture Adapted from NVdiffrast Presentation @ SIGGRAPH Asia 2020

4 of 97

Start with Rendering in General …

5 of 97

Why Differentiable Rendering？

Must be differentiable !

6 of 97

Two Main-Stream Rendering: Rasterization vs. Ray tracing

Light Transport Simulation /Ray Tracing

Traces rays to determine visibility and intersection points in the scene
Rays can go anywhere! All shaders and geometry must be available at the same time

Rasterization

Determines visibility by projecting geometry onto a 2D screen and resolving depth for each pixel
Each object can be a draw call

from: SIGGRAPH 2024 real-time ray tracing tutorial

Using ray tracing, last lecture Zhi-hao already mentioned, basically you shoot a ray through the pixel out into the scene, tracks the light a along the way to see where it hits the geometry, it reflects everywhere until you reach the light source, that means you need access handle the entire scene since rays could bounce anywhere

But for rasterization, each object is rendered in isolation, you determine visibility by projecting geometry onto a 2D screen and resolving depth for each pixel

Basically, the pipeline is per-object in rasterization but per-scene in ray tracing, and it executes on a grid of samples.

There is also a combination of the two called Inline ray tracing that allows for ray tracing operations to be performed directly within traditional rasterization shaders. This approach offers a flexible way to integrate ray tracing with rasterization pipelines, blending the strengths of both rendering methods. But we just ignore this technique for now

7 of 97

from: CMU 15-462

Keenan Crane

8 of 97

Two Main-Stream Differentiable Rendering

Light transport simulation (Ray Tracing)

redner [Li et. al 2018]
Mitusba2 [Nimier-David et al. 2019]
Mitusba3 [Jakob et al. 2022]

Favor quality over speed

Rasterization

SoftRas [Liu et. al 2019]
PyTorch3D [Ravi et al. 2020]
Nvdiffrast [Laine et al. 2020]

Favor speed over quality

9 of 97

Coverage (Visibility) Gradient Problem

Z discontinuity problem

XY discontinuity problem

from: PyTorch3D @ SIGGRAPH ASIA 2020 Course

10 of 97

Coverage (Visibility) Gradient Problem

Essentially: Moving a vertex has no 1st order effect on coverage

Common Tricks: blurring the geometry, make partial transparency, aggregate/ composite in an ad-hoc fashion

Problem: 1. Occlusions become fuzzy 2. must be shaded before composition

11 of 97

Shading with Blurred Triangles

Shade internally

Shade through memory

Composition of transparency and blur just causes overshading

12 of 97

This work

Pipeline Overview

Important Primitive Operations

All differentiable and accelerated with CUDA

13 of 97

This work

Pipeline Overview

Important Primitive Operations

All differentiable and accelerated with CUDA or hardware graphics pipeline

Target: efficiency, flexibility (freedom), modularity, quality

14 of 97

Input:

Vertex positions (homogeneous coordinates) in Clip Space

Triangular index buffer

Output:

(𝑢, 𝑣, 𝑧/𝑤, triangle ID) per pixel

Jacobian buffer 𝐽 = 𝜕{𝑢, 𝑣 }/𝜕{𝑥,𝑦}

Extra Notes:

Independent of shading, textures etc. (Modularity)

Accelerated using OpenGL hardware graphics pipeline (efficiency)

Backward pass gradient easy to get through chain rule (perspective mapping)

Dynamic mapping between world coordinates and discrete pixel coordinates.

15 of 97

Input:

Per-vertex attributes

Triangular index buffer

(𝑢, 𝑣) and Jacobian buffer 𝐽 = 𝜕{𝑢,𝑣}/𝜕{𝑥,𝑦} from rasterization

Output:

Interpolated attributes per pixel

Extra notes:

barycenter Jacobians is used for screen-space (attribute pixel) derivatives calculation using chain rule

Backward pass gradient also easy to get (interpolation)

Creating a mapping

between the pixels and the attributes

𝑢 = 𝑢(𝑥,𝑦)，𝑣=𝑣(𝑥,𝑦)， 𝑖th vertex by 𝐴_𝑖 ,

𝐴 = 𝑢𝐴_𝑖0 + 𝑣𝐴_𝑖1 + (1 − 𝑢 − 𝑣)𝐴_𝑖2

𝜕𝐴/𝜕{𝑥,𝑦} = [𝜕{𝑢,𝑣}/𝜕{𝑥,𝑦}][𝜕{𝐴}/𝜕{𝑢,𝑣}]

16 of 97

Input:

Per-pixel texture coordinates

Texture image (arbitrary # channels)

Attribute pixel derivatives 𝜕𝐴/𝜕{𝑥,𝑦}

Output:

Sampled texture per pixel

Standard operation in a shading system, Closely resemble attribute interpolation, differs by the multiscale nature, scale level determined by attribute pixel derivatives

17 of 97

Input:

Point sampled images, vertices, triangles

Output:

Antialiased image

Ket step, solve the coverage gradient problem in this step!

Insights:

Antialiasing converts the discontinuities to smooth changes, thus gradients can be computed
Modify pixel colors based on geometry

18 of 97

Modify pixel colors based on geometry

Similar idea as Distance-to-Edge Antialiasing (DEAA) [Malan 2010]) and Geometric Post-Process Antialiasing (GPAA) [Persson 2011]

Estimate occlusion ratio based on geometry

Blend in color from neighboring pixel

Extra Notes:

Approximate occlusion ratio, linear function of the location of the crossing point—from zero at midpoint to 50% at pixel center

Essentially approximates the exact surface coverage per pixel [Jalobeanu et al. 2004] using an axis-aligned slab, only perfect for axis aligned geometry

Detect geometric silhouette, and modify the colors of the pixels based on areas covered by the different surfaces, gives a good estimate of the color of the pixels, and now the gradient is continuous with respect to the position of the vertices

So actually here, antialiasing’s main purpose is not to make image quality better, but just to get the coverage gradient and make it be able to be propagated throughout the pipeline

Note that this is also blending, but blend things after shading (antialiasing is performed after shading), so the problem I mentioned at the beginning (over-shading and trade of of the framework flexibility) does not exist, this is much more efficient, also it doesn’t break occlusion, and stick to traditional way of rendering

(extra note)

Consequently the coverage estimate is exact for only perfectly vertical and horizontal edges that extend beyond the pixel. For a diagonal long edge that passes exactly between the pixel centers, the error in coverage is 18 th of a pixel

19 of 97

Performance comparison

15 ms/ frame

5 s/ frame

20 of 97

Qualitative Experiments

visibility gradients provide useful information even for small triangles

21 of 97

Qualitative Experiments

Modular design and flexibility of the framework can handle complex rendering pipeline

22 of 97

Qualitative Experiments

Cube pose optimization: blurring and transparency before shading is not really necessary

SoftRas [Liu et. al 2019]: 63.57°

Avg error:

NVdiffrast: 48.52°

+Two stage optimization: 22.49°

+symmetry noise 2.61°

23 of 97

Applications: Facial Performance Capture

Formulate facial performance capture as an inverse rendering problem:

Find a global texture and per-frame mesh, so that rendering them produces input images

Input:

sequences from 9 cameras with camera positions

Base mesh

24 of 97

Applications: Facial Performance Capture

Single GPU, 100000 iterations, 1 hour to convergence, better than commercial (DI4D)

25 of 97

Archeologist

Chaitanya

26 of 97

Differentiable rendering

Fast, correct, generalizable, flexibility, high resolution

27 of 97

Past Works

Linear combination of select faces

Hard to generalize

Morphable Model For The Synthesis Of 3D Faces (SIGGRAPH ’99)

Single image

Shape + Texture vectors

28 of 97

Common Variants

High Quality

High Performance

Differentiable Monte Carlo RT through Edge Sampling

Mitsuba 2

OpenDR

SoftRas

DIB-R

NVDiffRast

29 of 97

Differentiable Monte Carlo RT through Edge Sampling (SIGGRAPH ’18)

Physically based

What is the gradient

When a triangle moves?

approximate integral with MC

30 of 97

Differentiable Monte Carlo RT through Edge Sampling

Not noise free

Edge sampling is hard

Performance - slow rendering

good gradients

31 of 97

OpenDR

Limited shading model

Approximate gradients

General purpose

fast

Local shading

Spatial gradients

Interior pixel → within object
Boundary pixel → between objects

What is the gradient of a pixel for when the cylinder is rotated in plane

32 of 97

Neural Mesh Rendering (CVPR ’18)

Not scalable

Limited flexibility

Hallucinated gradients

What is the gradient at the pixel p w.r.t to the vertex x ?

Noise free

33 of 97

SoftRas (ICCV ’19)

differentiable aggregating process

computes influence of each triangle on

each pixel

34 of 97

SoftRas

Good gradients

Noise free

Approximate rendering

Cannot scale well wrt parameters

What is the right blur?

35 of 97

Accuracy comparison wrt NVDiffRast

36 of 97

Comparison

37 of 97

Archeologist:

Impact and Future Work

Ozgur Kara

38 of 97

39 of 97

https://github.com/sicxu/Deep3DFaceRecon_pytorch

40 of 97

Magic3d: High-resolution text-to-3d content creation

Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation

41 of 97

Extracting Triangular 3D Models, Materials, and Lighting From Images (CVPR 2022 Oral) (NVdiffRec)

Another important line of research influenced by NVdiffrast is NVdiffRec, which was published in 2022 as an oral paper at CVPR. In this work, they introduced an inverse rendering approach, which means they are using multi-view images to reconstruct 3D. They use known camera positions and background segmentation masks to help with the process. They additionally introduce a differentiable formulation of environment lighting to efficiently recover all-frequency lighting.

They jointly optimize the parameters for the mesh input, texture, and lighting. For the mesh, they utilize the differentiable marching tetrahedrons method to optimize the topology. For texture optimization, they employ volumetric texturing, and for lighting, they introduce a differentiable formulation specifically for lighting optimization.

In this 3D reconstruction, NVdiffrast plays a key role as the differentiable renderer, allowing the model to pass gradients back to the 3D parameters. This is a crucial part of making the reconstruction work.

42 of 97

Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising (Neurips 2022) (NVDiffRecMC)

As an extension for the NVdiffRec paper that I’ve just discussed, this paper which focuses on 3D reconstruction from multi-view images called as inverse rendering. It was published in Neurips 2022.

The core idea of the proposed method is to use a general lighting model to solve the rendering equation, which is approximated through Monte Carlo integration. Unlike NVDiffRec, which uses a split-sum approximation for direct lighting, this approach employs Monte Carlo ray tracing, leading to more physically accurate rendering. However, Monte Carlo methods can suffer from high variance, especially with a limited number of samples. To address this, the authors introduce importance sampling techniques to improve sampling efficiency and use denoisers to reduce variance, making the optimization process more manageable and the training more efficient. The combination of importance sampling and a denoising step significantly reduces variance, even when using fewer samples.

43 of 97

Differentiable Rendering Packages

Kaolin

NVIDIA Kaolin library provides a PyTorch API for working with a variety of 3D representations and includes a growing collection of GPU-optimized operations such as modular differentiable rendering…

fVDB

Developed by NVIDIA, fVDB is an open-source deep learning framework for sparse, large-scale, high-performance spatial intelligence. It builds NVIDIA-accelerated AI operators on top of OpenVDB to enable reality-scale digital twins, neural radiance fields, 3D generative AI, and more.

Mitsuba 3

Mitsuba 3 is a research-oriented retargetable rendering system, written in portable C++17 on top of the Dr.Jit Just-In-Time compiler. It is developed by the Realistic Graphics Lab at EPFL.

44 of 97

Private Investigator

Guang Yin

45 of 97

Timo Aila

Timo Aila joined NVIDIA Research in 2007 from Helsinki University of Technology, where he led the computer graphics research group. His expertise ranges from real-time rendering in computer games to to high-quality image synthesis, with contributions to the PantaRay rendering system used in Avatar, Tintin, and Hobbit. He also gained expertise in mobile graphics as the chief scientist of Hybrid Graphics, which was acquired by NVIDIA in 2006. Timo is currently working on machine learning, with a special focus on generative models such as StyleGAN. Previously he had a central role in NVIDIA's research efforts on ray tracing, including the design of the RTX hardware units.

Computer Graphics

[2000 - Now] Main topic: occlusion culling, ray tracing

Machine Learning

[2017 - Now] Main topic: Generative Adversarial Networks

GPU Hardware Design

46 of 97

[2000] [Real-time Graphics]

Master’s thesis ➡️ the first commercial occlusion culling library Umbra

47 of 97

[2000] [Real-time Graphics]

Master’s thesis ➡️ the first commercial occlusion culling library Umbra

48 of 97

[2009] [Ray Tracing]

In 2007, joined NVIDIA Research and led the computer graphics research group

How GPUs can be optimized for ray tracing

49 of 97

[2010] [Ray Tracing]

Simple!

Oh no…

Bounding Volume Hierarchies (BVH)

Used in Avatar, Tintin, and the Hobbit

We describe the architecture of a novel system for precomputing sparse directional occlusion caches. These caches are used for accelerating a fast cinematic lighting pipeline that works in the spherical harmonics domain. The system was used as a primary lighting technology in the movie Avatar, and is able to efficiently handle massive scenes of unprecedented complexity through the use of a flexible, stream-based geometry processing architecture, a novel out-of-core algorithm for creating efficient ray tracing acceleration structures, and a novel out-of-core GPU ray tracing algorithm for the computation of directional occlusion and spherical integrals at arbitrary points.

This paper introduces the PantaRay system, which focuses on accelerating ray traversal using Bounding Volume Hierarchies (BVH), a data structure commonly used to optimize ray-object intersection tests. This system has been highly influential in making real-time ray tracing more feasible, particularly in interactive applications like video games.

Impact in film industry

50 of 97

[2016] [Machine Learning]

Computer Graphics

Machine Learning

51 of 97

[2017] [Machine Learning for Computer Graphics]

Lead to Deep learning super sampling (DLSS)?

52 of 97

[2017] [Machine Learning] [8676 citations]

53 of 97

[2019] [Machine Learning] [12062 citations]

Makes GAN practical

Widespread use in creative industries

54 of 97

[2020] [Computer Graphics for Machine Learning]

Computer Graphics

Machine Learning

GPU hardware design

Highly optimized differentiable rendering pipeline

Aila's deep understanding of both graphics hardware and machine learning algorithms has enabled him to contribute to the development of techniques that merge these two fields. His work in differentiable rendering exemplifies this intersection. In the paper "Modular Primitives for High-Performance Differentiable Rendering" (SIGGRAPH Asia 2020), Aila and his colleagues introduced a design that leverages existing hardware pipelines to achieve high-performance differentiable rendering, which is crucial for tasks like inverse rendering and facial performance capture. This work allows for the integration of real-time rendering techniques with machine learning frameworks such as PyTorch and TensorFlow, facilitating the creation of neural networks that can render scenes accurately and efficiently.

55 of 97

Timo Aila

Broad and deep understanding of the whole system

Computer Graphics

Machine Learning

GPU Hardware Design

Diverse research directions provide a variety of tools to address bottlenecks across different domains.

Industrial Applications matter!

56 of 97

Critic

Harris Nisar

57 of 97

Raytracing vs. Rasterization

Raytracing and Global Illumination Intro. to Computer Graphics, CS180, Fall 2008 UC Santa Barbara – https://images.slideplayer.com/25/7723008/slides/slide_2.jpg

58 of 97

Raytracing vs. Rasterization

Raytracing and Global Illumination Intro. to Computer Graphics, CS180, Fall 2008 UC Santa Barbara – https://images.slideplayer.com/25/7723008/slides/slide_2.jpg

59 of 97

Resources

Renderer

Image

https://youtu.be/brDJVEPOeY8?si=mycNk7T7Wb1xJxZ9

Rasterization at a high level

60 of 97

Rasterization

Step 1: World to Screen Space + Clipping

transforming 3D object vertices from object space to screen space

Limitations

Based on camera’s view; can not handle off-screen reflections or indirect lighting.
Triangle number must be very high to approximate smooth curves.

https://imagecomputing.net/old_storage/old_website2/teaching/2019_2020/semester_1/INF630_refresher/class/content/014_projection_rasterization/projection.svg

https://img.clipart-library.com/2/clip-computer-graphicss/clip-computer-graphicss-1.jpg

61 of 97

Rasterization

Step 2: Triangles -> Pixels

Which pixels are covered by which triangle (scan-line algo)
Which triangles are in front (z buffer)

Limitations

Aliasing
Cannot handle transparency
Complex geo -> slower performance

Z Buffer

Output

Scene

https://youtu.be/brDJVEPOeY8?si=mycNk7T7Wb1xJxZ9

https://www.salomonsson.se/img/math_rasterizer_what_is.png

Scan-line Algorithm

This step determines which pixels are covered by which triangles and which triangles are in front. So say we want to render the scene in the bottom left. We first loop over all pixels and ask if the triangle lies on it. We also compute a z buffer which is an image encoding the depth information for each triangle (darker is closer). We can then use this information to draw triangles closer to the camera on top of those further away.

Since rasterization involves converting geometric shapes into pixel grids, curved and diagonal lines often appear jagged. Anti-aliasing techniques are required to smooth these edges, which increases computational cost.

Transparency is difficult to manage in rasterization. The z-buffer used to track depth doesn't account for transparent objects correctly, leading to issues with depth sorting and blending.

62 of 97

Rasterization

Step 3: Shading

Applying:

Color
Texture
Lighting

Limitations

Inability to handle global illumination
Inability to handle shadows and reflections (requires tricks like shadow maps and screen space reflections)

https://www.fragmentstorm.com/assets/images/2019-03-17-overview-of-the-graphics-pipeline/fragment-shading.png

https://miro.medium.com/v2/resize:fit:900/0*lOiF1XoVkXlWYvrB.jpg

https://learnopengl.com/Advanced-Lighting/Shadows/Shadow-Mapping

https://sugulee.wordpress.com/2021/01/16/performance-optimizations-for-screen-space-reflections-technique-part-1-linear-tracing-method/

Once pixels are covered by triangles, their color is calculated. This is where effects like lighting and texturing are applied.

Simulating indirect lighting or global illumination (light bouncing between objects) is not part of the rasterization pipeline because as it does not handling light bounces, making renders less realistic.

Handling complex effects such as shadows and reflections cannot be modeled explicitly and require tricks to approximate them like shadow maps which typically leads to jagged and hard shadows and screen space reflections which operates in the screen space so is not global and can lead to artifacts at the edges.

Shadow map stuff (just in case):

If you looked out from a source of light, all the objects you can see would appear in light. Anything behind those objects, however, would be in shadow. This is the basic principle used to create a shadow map. The light's view is rendered, storing the depth of every surface it sees (the shadow map). Next, the regular scene is rendered comparing the depth of every point drawn (as if it were being seen by the light, rather than the eye) to this depth map.

It uses the depth and the normal direction for each pixel to compute the reflection ray. Then, it traces the ray in screen space until the ray intersects with the geometry. Where the ray intersects with the geometry is the location of the pixel to be reflected. By adding the pixel color to the color of the original pixel color, it creates a reflection effect. This only handles reflections for things in the screen view (no global info), which can also cause problems at the edges of the screen.

63 of 97

Rasterization is:

Fast 😀
But inaccurate 😢

64 of 97

Rasterization is:

Fast 😀
But inaccurate 😢

This is definitely a “limitation” but I would not call it a full blown critique – the authors acknowledge them and deciding on using this system should be based on your application requirements 🤔

65 of 97

Discontinuities rendering pipeline

66 of 97

Geometric Post-process Anti-Aliasing (GPAA)

67 of 97

Graduate Student

Possible Improvements And Applications

Junkun Chen

https://docs.google.com/presentation/d/1FbA29d21cz8CczlFYq-xqqgplqQn2J1hi3mT0xdBd8Y/edit?usp=sharing

68 of 97

Idea of Improvements

“Rasterization” → 3D Gaussian Splatting

“Meshes” → Tetrahedron → Differentiable Marching Tetrahedra

69 of 97

Idea 1: Apply with advanced approaches from 3DGS

NVDiffRast cannot, but 3DGS can:

Cannot deal with semi-transparency
Cannot dynamically add or delete vertices, can only move
No native anti-aliasing
Needs lots of “soften” operations to make it differentiable

70 of 97

Idea 1: Apply with advanced approaches from 3DGS

First, enable transparency/volume density through alpha maps
Combine mesh triangles with Gaussians

For each edge, model the opacity as a Gaussian distribution w.r.t. dist. from the edge
For each triangle surface, similarly model as a Gaussian w.r.t. dist. from the surface
No need to “soften” any of the occlusion/disocclusion operations

Each surface can be regarded as a “Gaussian” centered at the triangle

71 of 97

Idea 1: Apply with advanced approaches from 3DGS

Traditional idea needs “differentiable sorting” to render transparency
But 3DGS does not need to do this

Split the screen into tiles, radix-sort Gaussians in each tile w/o gradients
Directly apply volume rendering formula in this order

Apply similar methods here

Achieve transparency in a similar way as Gaussian

72 of 97

Idea 1: Apply with advanced approaches from 3DGS

3DGS natively supports anti-aliasing via the Gaussian distribution
Also apply 3DGS’ culling and creation to optimize vertices and surfaces

Remove surfaces with too high transparency
If one edge is too long, create a vertex at middle point
If one edge is too short, combine their side vertices

73 of 97

Idea 2: Geometry manipulation via Tetrahedra

NVDiffRast does not support topology change

Cannot add surfaces into the mesh
Cannot create or fill-up holes along with training
Even with 3DGS augmentation (surface deletion only)

DMTet natively support this

74 of 97

Idea 2: Geometry manipulation via Tetrahedra

With the initial Mesh, initialize a DMTet network

Simulate marching tetrahedra to convert the current mesh into tetrahedra
Initially, all meshes will be a surface of a tetrahedra

Render with DMTet modeling

Predict meshes from SDF, so that each coordinate has a gradient w.r.t. SDF network
Can also combine Idea 1 for more differentiable operations

DMTet’s modeling can natively support topological changes

Only need to adjust the SDF and maintain the intersecting tetrahedra set
Supports: new holes, fill up holes, object creation, object deletion, etc., through training
Also enables multi-scale shape modeling

75 of 97

Idea for Application: Progressive Scene editing

The method in Idea 2 makes it possible for some gradually shape editing

If directly apply aggressive shape editing, may not receive correct gradient signals
However, if the shape editing is slight, it is likely to get success through some minor adjustments

Apply the progressive editing in “ProgressEditor”, we can achieve high-efficiency scene editing progressively, even animating

76 of 97

Works that demonstrates a similar idea

2DGS: 2D Gaussian splatting (instead of 3D) for shape reconstruction

Also uses Gaussian + 2D surface representation

“Mesh-Guided Neural Implicit Field Editing”

Uses tetrahedra representation for human-guided scene editing
Applies OCTree data structure for speeding up

77 of 97

Industrial Practitioner

Mei Han

78 of 97

Smart XR Glasses: iGlasses

Glasses

More Wearable, More Efficient, More Realistic
Smarter Assistant
AI+Companion
Vivid Memory Engrams

79 of 97

More Wearable, More Efficient, More Realistic

Two-layer diffraction optical waveguide
Only rendering each eye can see
Less differential rendering task
Immersive depth perception

Left Right

80 of 97

iGlasses your personal life assistant

Be your Nutritionist🥗Coach, Cookbooks
Be your Eyes👀 (helping blind or low vision people)
Be your Translator (even pet translator)

81 of 97

Interactive 3D Avatar

AI Companion: AI companion with clear ethical boundaries.
Lifelike Real-Time Interactions: VLMs + Nvdiffrast, context-based facial expressions
Emotion Support: Provides 24/7 emotional support for psychological therapy

Virtual Friend

EVE

(Nature Select)

Otome Game

Love and Deepspace

(Papergames)

Real-time Dialogue

DAN Mode

(ChatGPT)

Electronic Pet

RoVR

（Ridgeline Labs）

82 of 97

Interactive 3D Avatar

Lifelike audio-driven talking faces generation
More nuanced facial expression rendering

VASA-1

Xu, S., Chen, G., Guo, Y. X., Yang, J., Li, C., Zang, Z., ... & Guo, B. (2024). Vasa-1: Lifelike audio-driven talking faces generated in real time. arXiv preprint arXiv:2404.10667

Guo, J., Zhang, D., Liu, X., Zhong, Z., Zhang, Y., Wan, P., & Zhang, D. (2024). Liveportrait: Efficient portrait animation with stitching and retargeting control. arXiv preprint arXiv:2407.03168..

LivePortrait

83 of 97

Vivid Memory Engrams

Baby’s first step

Key points of knowledge

Cache momentary spatial snapshots and preserve the memories you need
Digital contact book generation for new acquaintance

Unforgettable moments

84 of 97

Differentiable Rendering 4Science

Biological Physics
Material Science

[1] Ichbiah, S., Delbary, F., & Turlier, H. (2023). Differentiable rendering for 3d fluorescence microscopy. arXiv preprint arXiv:2303.10440.

[2] Sego, T. J., Sluka, J. P., Sauro, H. M., & Glazier, J. A. (2023). Tissue forge: interactive biological and biophysics simulation environment. PLOS Computational Biology, 19(10), e1010768.

[3] Shi, L., Li, B., Hašan, M., Sunkavalli, K., Boubekeur, T., Mech, R., & Matusik, W. (2020). Match: Differentiable material graphs for procedural material capture. ACM Transactions on Graphics (TOG), 39(6), 1-15.

MATch

convert photographs of material samples into procedural material model

Tissue Forge

Interactive biological and biophysics simulation environment

3D fluorescence microscopy

85 of 97

Hacker 1: Bundle Adjustment using Pytorch3d

Yufeng Liu

86 of 97

Bundle Adjustment by Differentiable Rendering

In this project I implemented bundle adjustment in Pytorch3D

Objective:

Given multi-view images, solve camera poses, object mesh, and object texture
Fit a deform offset to a sphere

Experiments:

Reconstruct Simple mesh
Reconstruct Complex mesh
Reconstruct Real world object

87 of 97

Simple Mesh

Set up:

Given a known mesh and a set of cameras
All cameras point towards the origin for simplicity
Render a set of rgb and silhouette images as ground truth
Perturb camera positions
Start with a sphere mesh with uniform texture

88 of 97

Implementation Overview:

Deformation is applied to input mesh

The predicted cameras render a set of images

Calculate average loss across all rendered images

Loss consists of the following parts:

RGB: L2 loss
Silhouette: L2 loss
Mesh edge: prevents mesh size shrinking or expanding too much
Mesh normals: normals of neighboring faces should be close
Laplacian: mesh smoothness

I used Adam optimizer to do back propagation

Update deformation and cameras

89 of 97

Views of predicted cameras

Views of groundtruth

cameras

90 of 97

Complex Mesh

91 of 97

Views of predicted cameras

Views of groundtruth

cameras

92 of 97

Real Images?

Need to do more

93 of 97

Hacker 2: Minimal Implementation of Soft Rasterizer

Christopher Conway

94 of 97

Soft rasterizer is “truly differentiable” as it:
directly renders color mesh with differentiable functions
back-propogrates supervision to mesh vertices/attributes
The paper demonstrates rendering of a triangle primitive

Liu S. et al, “Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning”, CVPR 2019

Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning

95 of 97

For this project I did a minimal implementation of a soft backward function for rendering a rectangle on an image
I implemented a soft geometry rectangle to probabilistically map the forward rendering function
The backward gradient is used to go from pixel to primitive
Results are shown for estimation of a mask image of a rectangle
The key equation from the paper (for triangles) is:

Liu S. et al, “Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning”, CVPR 2019

Minimal Implementation for Rectangular Primitive

96 of 97

The primitive function renders a rectangle with center, width, and height and generates the probability map based on distance to the edges (previous D function)
Cross entropy is used to compute loss between render and mask
Adam optimizer is implemented in pytorch for quick tensor broadcasting

Initial Primitive

Target Mask

Rendered Rectangle

Implementation Overview:

97 of 97

The animation below shows the progression over 1000 iterations, runtime was just 1-2 s.
I experimented with sigma and adam optimizer LR values to output successfully
Increasing sigma softens the render and impacts center (2.0 best) vs height/width (5.0 best)
Further optimization could improve performance and decrease runtime

Rendering Animation: