1 of 27

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Hyunbae Kim

2 of 27

3 of 27

Animatable human avatar

4 of 27

Limitation of meshes and points clouds

  • Previous explicit avatar representations necessitate dense reconstructed meshes to model human geometry, thus limiting their applications in sparse-view video-based avatar modeling.

5 of 27

Limitation of implicit representations(NeRF)

  • Implicit representations require a coordinate-based MLP to regress a continuous field, suffering from the low-frequency spectral bias of MLPs.

6 of 27

3DGS

7 of 27

Contribution of Animatable Gaussians

  • Animatable Gaussians, a new avatar representation that introduces explicit 3D Gaussian splatting into avatar modeling to employ powerful 2D CNNs for creating life-like avatars with high-fidelity pose-dependent dynamics.�
  • Template-guided parameterization that learns a character-specific template for general clothes like dresses, and parameterizes 3D Gaussians onto front & back Gaussian maps for compatibility with 2D networks.�

8 of 27

Contribution of Animatable Gaussians

  • Animatable Gaussians, a new avatar representation that introduces explicit 3D Gaussian splatting into avatar modeling to employ powerful 2D CNNs for creating life-like avatars with high-fidelity pose-dependent dynamics.
  • Template-guided parameterization that learns a character-specific template for general clothes like dresses, and parameterizes 3D Gaussians onto front & back Gaussian maps for compatibility with 2D networks.�

9 of 27

Contribution of Animatable Gaussians

  • Animatable Gaussians, a new avatar representation that introduces explicit 3D Gaussian splatting into avatar modeling to employ powerful 2D CNNs for creating life-like avatars with high-fidelity pose-dependent dynamics.
  • Template-guided parameterization that learns a character-specific template for general clothes like dresses, and parameterizes 3D Gaussians onto front & back Gaussian maps for compatibility with 2D networks.

10 of 27

Preliminary: 3DGS

11 of 27

Overview

12 of 27

Learning Parametric Template

Goal : Reconstruct a canonical geometric model as the template

Represent canonical character as SDF and color field instantiated by an MLP

  • Given the multi-view videos, select one frame in which the character is under a near A-pose.
  • Precompute a skinning weight volume W in the canonical space by diffusing the weights from the SMPL surface throughout the whole 3D volume along the surface normal.

13 of 27

Learning Parametric Template

Goal : Reconstruct a canonical geometric model as the template

  • For each point in the posed space, search its canonical correspondence by root finding
  • The canonical correspondence is fed into the MLP to query its SDF and color, which are used the render RGB images by SDF-based volume rendering.
  • The rendered images are compared with the ground truth for optimizing the canonical fields via differentiable volume rendering.

14 of 27

Learning Parametric Template

Goal : Reconstruct a canonical geometric model as the template

  • Extract the geometric template from the SDF field and query the skinning weights for each vertex in precomputed weight volume W, obtaining a deformable parametric template.

15 of 27

Template-guided Parameterization

  • To ensure compatibility with 2D networks, the 3D representation of the human avatar needs to be parameterized in 2D space.

16 of 27

Pose-dependent Gaussian Maps

: StyleUNet, StyleGAN-based CNN

: front and back pose-dependent Gaussian maps

: View direction map

17 of 27

LBS of 3D Gaussians

  • p_c : position of 3D Gaussian, Σ_c : covariance of 3D Gaussian
  • R, t : Rotation and translation vector calculated with the skinning weights of each 3D Gaussian

18 of 27

Training: Loss

(ϕₗ is a layer of the pretrained CNN, e.g., VGG16)

19 of 27

Results

20 of 27

Results: Comparison with body-only avatars

21 of 27

Results: Comparison with AvatarReX

22 of 27

Results: Quantitative comparison

23 of 27

Ablation: Parametric Template

24 of 27

Ablation: Backbones

25 of 27

Ablation: Pose Projection

w/o

w/o

w/

w/

26 of 27

Limitation

  • Animatable Gaussian entangles the modeling of the human body and clothes, limiting to changing the clothes of the avatar for applications like virtual try-on.�
  • Animatable gaussian relies on the multi-view input to reconstruct a parametric template, limiting the application for modeling loose clothes from a monocular video.

27 of 27