Why Structure-from-Motion?
- Memory-efficient: Sequential mapping avoids memory spikes.
- Robust: 360° images provide large feature overlap for reliable matches.
- Stable: Supports stricter convergence criteria and improved stability.
Rig based setup
Rig poses over time + Camera-in-rig extrinsics (Fixed)
Optimization objective
- Minimize sum of squared reprojection error
- For every observed keypoint we:
- Project the 3D point into that camera.
- Measure the pixel error
�
Adaptive Data Collection For High Fidelity Gaussian Splatting
Taru Rustagi, Juntao (Jordan) Zhang
Advised by Prof. Laszlo A. Jeni
Fujitsu Advisors: Koichiro Niinuma, Mose Sakashita
Monocular Depth Estimation with Scale Alignment
Constrained SfM with Camera Grouping
- Input: Single 360° video stream in equirectangular projection
- Outputs (for 3D Gaussian Splatting reconstruction):
- Sparse Point Cloud for Gaussian initialization
- Cubemap Images for scene representation
- Camera Poses with accurate 3D localization
- Scale Aligned Depth for depth supervision
GS for Unconstrained Photo Collections
Distractor Removal: Remove moving cars, pedestrians and other objects for better reconstruction using segmentation masks.
NVS using Diffusion: Augment training views after 15000 steps using diffusion based models (Difix3D+), and also use it as an inference model to remove GS artifacts.
Motivation and Prior Work
Motivation:
- Ease of Capture: Simplify data collection while improving GS fidelity.
- Robustness: Handle distractors, lighting changes, andscene variability.
- Stability: Enable stricter convergence and more reliable reconstruction
Prior Work:
- Equirectangular SLAM integrated with Splatfacto-W.
Issues:
- SLAM drift leading to inaccurate camera trajectories.
- High computational cost for large-scale or long-sequence captures.
- Geometry deformation due to overfitting and weak regularization.