1 of 1

SuperPoint

  • SuperGlue

Large Scale Camera Array Calibration via SfM

Sanjana Gunna Gaini Kussainova

Wei Pu Shubham Garg

OVERVIEW

INTRODUCTION

OUR METHOD

FUTURE DIRECTIONS

Mugsy v1

Mugsy v2

Images from the Multiface Dataset

  • Non-Rigid SfM

Multiface dataset contains an average of 12,200 (v1) frames per subject with capture rate at 30 fps and each frame has 40 (v1) & 60 (v2) different camera views. Non-Rigid SfM over these video clips might help improve the performance over the Rigid SfM baselines.

  • High-end Multi-view Capturing System (Mugsy)
  • Captures synchronized multi-view videos of facial expressions exhibited by the subjects sitting inside the dome
  • Applications: Facial Reconstruction, Virtual Human Generation (Photo-realistic avatars)

Dataset Statistics

  • However, calibration of the cameras using calibration objects in the dome setup (40 ~ 150) cameras is time-consuming!
  • Build an auto-calibration system, which is capable of determining internal camera parameters directly from multiple uncalibrated images.

  • Integrate into Structure from Motion pipeline to achieve better calibration precision when compared to computing calibration data using a special calibration object.

References:

  1. Source of Images - https://openaccess.thecvf.com/content_ICCV_2017/papers/Ha_Deltille_Grids_for_ICCV_2017_paper.pdf
  2. Wuu et al. Multiface: A Dataset for Neural Face Rendering. 10.48550/arXiv.2207.11243.
  3. Lindenberger, Philipp et al. “Pixel-Perfect Structure-from-Motion with Featuremetric Refinement.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 5967-5977.
  • Better Features

We used pre-trained deep learning based models that are predominantly trained on datasets that are limited to places, architecture, etc. We could improve the baselines if we finetune such models on facial data to capture much better and more semantically meaningful facial keypoints.

RESULTS

Data Initialization

- Fixed Extrinsics (for all the camera views)

- Noisy Intrinsics (added gaussian noise to GT intrinsics)

SIFT

Facial Keypoints

Facial Keypoints + SIFT

  • From our experiments, we observed that the parameters heavily rely on the quality as well as the quantity of the features

  • SIFT feature matches are not good enough!

  • Additional features such as deep learning-based features could improve the performance such as

SuperPoint + SuperGlue, Facial Landmarks (Meta detector)

  • Analysis of the sensitivity of parameters (focal lengths [fx, fy] vs principal point [cx, cy]) can guide us in improving the accuracy of predictions

Experiments

L1 (fx)

L1 (fy)

L1 (cx)

L1 (cy)

Reprojection Error

Refine cx, cy

-

-

6.349

6.458

3.093

Refine fx, fy

3.764

2.927

-

-

0.180

Refine all

89.441

80.637

4.015

5.278

5.961

Table 1: Analysis of the sensitivity of parameters

Table 2: Analysis of different features

Experiments

L1 (fx)

L1 (fy)

L1 (cx)

L1 (cy)

Reprojection Error

SIFT

89.441

80.637

4.015

5.278

5.961

SuperPoint + SuperGlue

21.713

16.751

7.379

17.878

8.254

Facial Keypoints

207.606

173.701

20.845

12.344

20.525

3D Reconstruction using various features

Effect of using “Feature-metric Refinement” for keypoint adjustment

  • An overall decrease in the reprojection error irrespective of the kind of features used

Various Features:

SIFT: 5k

SuperPoint: 2k

Facial Keypoints: 200

Feature-metric Refinement: Keypoint Adjustment

  • High level of accuracy and precision from multiple views during the reconstruction process
  • Optimizes a feature-metric error based on dense features predicted by a neural network