1 of 14

On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Computer Vision Lab, EPFL

Soumava Kumar Roy*

Leonardo Citraro*

Sina Honari

Pascal Fua

2 of 14

Overview

  • Robust Weighting method.
  • Sport Center Dataset.
  • Training Strategy (Multiview and Single View 3D Pose Estimation)
  • Results

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

3 of 14

Motivation

  • Manual 3D pose annotation is labour intensive and time consuming.

  • Generate pseudo 3D labels using a pretrained 2D pose estimator model.

  • These models generally fail due to:
      • Occlusion.
      • Change in illumination.
      • Low resolution images.

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Noisy Keypoints

Detection

Robust Weighting

Algorithm

4 of 14

Weighted Triangulation

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

  1. For each joint, we calculate the 3D pose for each pairwise combination of the cameras.

cam_0

cam_1

5 of 14

Weighted Triangulation

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

cam_0

cam_1

cam_0

cam_2

cam_0

cam_3

cam_4

cam_5

Wait

6 of 14

Weighted Triangulation

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Geometric Median

  1. We then calculate the Geometric Median of such pairwise 3D poses.

7 of 14

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Weighted Triangulation

  1. We fit a 3D Gaussian on the GM to obtain the weights for each pair of cameras.

8 of 14

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Weighted Triangulation

  1. We calculate the median of the weights of the pairs.

cam_1

cam_2

cam_3

cam_4

cam_5

cam_0

0.58

0.02

0.25

0.27

0.51

0.27 is the weight of cam_0.

  1. We repeat the process for every camera and for each joint to obtain the weight matrix of size

(Number of Joints) X (Number of cameras).

cam_1

cam_2

cam_3

cam_4

cam_5

cam_0

0.58

0.02

0.25

0.27

0.51

Median

9 of 14

SportCenter Dataset

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

cam_0

cam_1

cam_2

cam_5

cam_4

cam_3

  • 13 subjects playing a game of basketball.

  • 8 fixed and calibrated cameras.

  • 6 cameras are used for pose estimation, while the remaining 2 cameras are used for tracking of the players.

  • Occlusion by other players or static structures.

  • Various Ligthing Conditions.

  • 0.3M Images.

  • We have annotated 3740 2D poses and 700 3D poses.

10 of 14

Training of 2D Pose Estimator

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

.

.

.

.

3D Pose

Triangulation

Supervised samples

is the annotated 2d pose to view v*

Direction of

Gradient Flow

From

From

.

.

.

.

2D pose estimator network[1]

[1] Li, Jiefeng, et al. "Crowdpose: Efficient crowded scenes pose estimation and a new benchmark." CVPR 2019.

Unsupervised samples

Projection Operation

.

.

.

.

with weights

11 of 14

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Single View 3D Pose Estimation

3D

Loss

Target

Image

Direction of Gradient Flow.

  • Triangulated 3D pose for unsupervised samples.
  • Ground Truth 3D pose for supervised samples.

1024

16

R50

P3D

Lifting Network

Concatenation operation.

12 of 14

Results on SportCenter Dataset

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Method

Weights

Differentiable

MPJPE (mms)

Multi View

Single View

Ours non-differentiable w/o weights

109.7

142.9

Ours non-differentiable

83.0

111.4

Ours w/o weights

80.5

118.5

Ours + Iskakov[2]

88.3

121.1

Ours

66.9

104.4

[2] Iskakov et.al. “Learnable Triangulation of Human Pose.” ICCV 2019.

13 of 14

Qualitative Results

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

w/o Weights

w/o Weights

Ours

Multiview

Ours

Single View

14 of 14

Contributions

  1. A self-supervised multi-view consistency based on differentiable triangulation.

  • A novel weighting strategy to obtain pseudo 3D labels that mitigates the effect of occlusion and noisy predictions.

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022