1 of 27

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Hyunbae Kim

2 of 27

https://animatable-gaussians.github.io/

3 of 27

Animatable human avatar

4 of 27

Limitation of meshes and points clouds

Previous explicit avatar representations necessitate dense reconstructed meshes to model human geometry, thus limiting their applications in sparse-view video-based avatar modeling.

5 of 27

Limitation of implicit representations(NeRF)

Implicit representations require a coordinate-based MLP to regress a continuous field, suffering from the low-frequency spectral bias of MLPs.

7 of 27

Contribution of Animatable Gaussians

Animatable Gaussians, a new avatar representation that introduces explicit 3D Gaussian splatting into avatar modeling to employ powerful 2D CNNs for creating life-like avatars with high-fidelity pose-dependent dynamics.�
Template-guided parameterization that learns a character-specific template for general clothes like dresses, and parameterizes 3D Gaussians onto front & back Gaussian maps for compatibility with 2D networks.�

8 of 27

Contribution of Animatable Gaussians

Animatable Gaussians, a new avatar representation that introduces explicit 3D Gaussian splatting into avatar modeling to employ powerful 2D CNNs for creating life-like avatars with high-fidelity pose-dependent dynamics.�
Template-guided parameterization that learns a character-specific template for general clothes like dresses, and parameterizes 3D Gaussians onto front & back Gaussian maps for compatibility with 2D networks.�

9 of 27

Contribution of Animatable Gaussians

Animatable Gaussians, a new avatar representation that introduces explicit 3D Gaussian splatting into avatar modeling to employ powerful 2D CNNs for creating life-like avatars with high-fidelity pose-dependent dynamics.�
Template-guided parameterization that learns a character-specific template for general clothes like dresses, and parameterizes 3D Gaussians onto front & back Gaussian maps for compatibility with 2D networks.�

10 of 27

Preliminary: 3DGS

12 of 27

Learning Parametric Template

Goal : Reconstruct a canonical geometric model as the template

Represent canonical character as SDF and color field instantiated by an MLP

Given the multi-view videos, select one frame in which the character is under a near A-pose.
Precompute a skinning weight volume W in the canonical space by diffusing the weights from the SMPL surface throughout the whole 3D volume along the surface normal.

13 of 27

Learning Parametric Template

Goal : Reconstruct a canonical geometric model as the template

For each point in the posed space, search its canonical correspondence by root finding

The canonical correspondence is fed into the MLP to query its SDF and color, which are used the render RGB images by SDF-based volume rendering.
The rendered images are compared with the ground truth for optimizing the canonical fields via differentiable volume rendering.

14 of 27

Learning Parametric Template

Goal : Reconstruct a canonical geometric model as the template

Extract the geometric template from the SDF field and query the skinning weights for each vertex in precomputed weight volume W, obtaining a deformable parametric template.

15 of 27

Template-guided Parameterization

To ensure compatibility with 2D networks, the 3D representation of the human avatar needs to be parameterized in 2D space.

16 of 27

Pose-dependent Gaussian Maps

: StyleUNet, StyleGAN-based CNN

: front and back pose-dependent Gaussian maps

: View direction map

17 of 27

LBS of 3D Gaussians

p_c : position of 3D Gaussian, Σ_c : covariance of 3D Gaussian
R, t : Rotation and translation vector calculated with the skinning weights of each 3D Gaussian

18 of 27

Training: Loss

(ϕₗ is a layer of the pretrained CNN, e.g., VGG16)

20 of 27

Results: Comparison with body-only avatars

21 of 27

Results: Comparison with AvatarReX

22 of 27

Results: Quantitative comparison

23 of 27

Ablation: Parametric Template

24 of 27

Ablation: Backbones

25 of 27

Ablation: Pose Projection

w/o

26 of 27

Limitation

Animatable Gaussian entangles the modeling of the human body and clothes, limiting to changing the clothes of the avatar for applications like virtual try-on.�
Animatable gaussian relies on the multi-view input to reconstruct a parametric template, limiting the application for modeling loose clothes from a monocular video.

1 of 27

2 of 27

3 of 27

4 of 27

5 of 27

6 of 27

7 of 27

8 of 27

9 of 27

10 of 27

11 of 27

12 of 27

13 of 27

14 of 27

15 of 27

16 of 27

17 of 27

18 of 27

19 of 27

20 of 27

21 of 27

22 of 27

23 of 27

24 of 27

25 of 27

26 of 27

27 of 27