VGGSfM: Visual Geometry Deep Structure From Motion�Visual Geometry Group, University of Oxford; Meta AI�CVPR 2024
Samuel Chua
16 September 2025
Contents
2
Background – Structure-from-Motion (SfM)
3
Related Works
4
Related Works
5
Motivation
6
Method - VGGSfM
7
Point Tracker
Initial Camera Estimator
Triangulator
Bundle-Adjustment
Method - VGGSfM
8
Point Tracker
Initial Camera Estimator
Triangulator
Bundle-Adjustment
Method - VGGSfM
9
Point Tracker
Initial Camera Estimator
Triangulator
Bundle-Adjustment
Method - VGGSfM
10
Point Tracker
Initial Camera Estimator
Triangulator
Bundle-Adjustment
Method - VGGSfM
11
Point Tracker
Initial Camera Estimator
Triangulator
Bundle-Adjustment
Goal:
Minimises reprojection loss with second-order Levenberg-Marquardt (LM) optimizer
Method - VGGSfM
12
Point Tracker
Initial Camera Estimator
Triangulator
Bundle-Adjustment
- Evaluate the ϵ-thresholded pseudo-Huber loss between ground-truth 3D points and BA-defined 3D points
- Compare the predicted initial pose and bundle-adjusted camera pose to ground-truth camera annotation
- Likelihood of a ground-truth track point under a probabilistic track-point estimate defined by a 2D gaussian with mean and variance predictions respectively
Experiments
13
CO3Dv2 → internet-scale collection, category-diverse.
IMC Phototourism → large-scale, unstructured internet photos.
ETH3D → high-precision indoor/outdoor multi-view benchmark.
Experiments
14
Experiments
15
Experiments
16
Experiments
17
Experiments
18
Experiments
19
Conclusions
20