SuperPoint
Large Scale Camera Array Calibration via SfM
Sanjana Gunna Gaini Kussainova
Wei Pu Shubham Garg
OVERVIEW
INTRODUCTION
OUR METHOD
FUTURE DIRECTIONS
Mugsy v1
Mugsy v2
Images from the Multiface Dataset
Multiface dataset contains an average of 12,200 (v1) frames per subject with capture rate at 30 fps and each frame has 40 (v1) & 60 (v2) different camera views. Non-Rigid SfM over these video clips might help improve the performance over the Rigid SfM baselines.
Dataset Statistics
References:
We used pre-trained deep learning based models that are predominantly trained on datasets that are limited to places, architecture, etc. We could improve the baselines if we finetune such models on facial data to capture much better and more semantically meaningful facial keypoints.
RESULTS
Data Initialization
- Fixed Extrinsics (for all the camera views)
- Noisy Intrinsics (added gaussian noise to GT intrinsics)
SIFT
Facial Keypoints
Facial Keypoints + SIFT
SuperPoint + SuperGlue, Facial Landmarks (Meta detector)
| Experiments | L1 (fx) | L1 (fy) | L1 (cx) | L1 (cy) | Reprojection Error | 
| Refine cx, cy | - | - | 6.349 | 6.458 | 3.093 | 
| Refine fx, fy | 3.764 | 2.927 | - | - | 0.180 | 
| Refine all | 89.441 | 80.637 | 4.015 | 5.278 | 5.961 | 
Table 1: Analysis of the sensitivity of parameters
Table 2: Analysis of different features
| Experiments | L1 (fx) | L1 (fy) | L1 (cx) | L1 (cy) | Reprojection Error | 
| SIFT | 89.441 | 80.637 | 4.015 | 5.278 | 5.961 | 
| SuperPoint + SuperGlue | 21.713 | 16.751 | 7.379 | 17.878 | 8.254 | 
| Facial Keypoints | 207.606 | 173.701 | 20.845 | 12.344 | 20.525 | 
3D Reconstruction using various features
Effect of using “Feature-metric Refinement” for keypoint adjustment
Various Features:
SIFT: 5k
SuperPoint: 2k
Facial Keypoints: 200
Feature-metric Refinement: Keypoint Adjustment