1 of 14

On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Computer Vision Lab, EPFL

Soumava Kumar Roy*

Leonardo Citraro*

Sina Honari

Pascal Fua

2 of 14

Overview

Robust Weighting method.
Sport Center Dataset.
Training Strategy (Multiview and Single View 3D Pose Estimation)
Results

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

3 of 14

Motivation

Manual 3D pose annotation is labour intensive and time consuming.

Generate pseudo 3D labels using a pretrained 2D pose estimator model.

These models generally fail due to:

Occlusion.
Change in illumination.
Low resolution images.

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Noisy Keypoints

Detection

Robust Weighting

Algorithm

4 of 14

Weighted Triangulation

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

For each joint, we calculate the 3D pose for each pairwise combination of the cameras.

cam_0

cam_1

5 of 14

Weighted Triangulation

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

cam_0

cam_1

cam_0

cam_2

cam_0

cam_3

cam_4

cam_5

Wait

6 of 14

Weighted Triangulation

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Geometric Median

We then calculate the Geometric Median of such pairwise 3D poses.

7 of 14

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Weighted Triangulation

We fit a 3D Gaussian on the GM to obtain the weights for each pair of cameras.

8 of 14

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Weighted Triangulation

We calculate the median of the weights of the pairs.

	cam_1	cam_2	cam_3	cam_4	cam_5
cam_0	0.58	0.02	0.25	0.27	0.51

0.27 is the weight of cam_0.

We repeat the process for every camera and for each joint to obtain the weight matrix of size

(Number of Joints) X (Number of cameras).

	cam_1	cam_2	cam_3	cam_4	cam_5
cam_0	0.58	0.02	0.25	0.27	0.51

Median

9 of 14

SportCenter Dataset

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

cam_0

cam_1

cam_2

cam_5

cam_4

cam_3

13 subjects playing a game of basketball.

8 fixed and calibrated cameras.

6 cameras are used for pose estimation, while the remaining 2 cameras are used for tracking of the players.

Occlusion by other players or static structures.

Various Ligthing Conditions.

0.3M Images.

We have annotated 3740 2D poses and 700 3D poses.

10 of 14

Training of 2D Pose Estimator

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

.

3D Pose

Triangulation

Supervised samples

is the annotated 2d pose to view v*

Direction of

Gradient Flow

From

.

2D pose estimator network^[1]

[1] Li, Jiefeng, et al. "Crowdpose: Efficient crowded scenes pose estimation and a new benchmark." CVPR 2019.

Unsupervised samples

Projection Operation

.

with weights

11 of 14

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Single View 3D Pose Estimation

3D

Loss

Target

Image

Direction of Gradient Flow.

Triangulated 3D pose for unsupervised samples.
Ground Truth 3D pose for supervised samples.

1024

16

R50

P^3D

Lifting Network

Concatenation operation.

12 of 14

Results on SportCenter Dataset

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

Method	Weights	Differentiable	MPJPE (mms)
			Multi View	Single View
Ours non-differentiable w/o weights	✕	✕	109.7	142.9
Ours non-differentiable	✓	✕	83.0	111.4
Ours w/o weights	✕	✓	80.5	118.5
Ours + Iskakov[2]	✓	✓	88.3	121.1
Ours	✓	✓	66.9	104.4

[2] Iskakov et.al. “Learnable Triangulation of Human Pose.” ICCV 2019.

13 of 14

Qualitative Results

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022

w/o Weights

Ours

Multiview

Ours

Single View

14 of 14

Contributions

A self-supervised multi-view consistency based on differentiable triangulation.

A novel weighting strategy to obtain pseudo 3D labels that mitigates the effect of occlusion and noisy predictions.

A new multi-view multi-person dataset of an amateur basketball match featuring occlusion and difficult lighting conditions. Dataset link: https://www.epfl.ch/labs/cvlab/data/sportcenter-dataset

Roy et al. On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

11. Oktober 2022