1 of 1

Why Structure-from-Motion?

  • Memory-efficient: Sequential mapping avoids memory spikes.
  • Robust: 360° images provide large feature overlap for reliable matches.
  • Stable: Supports stricter convergence criteria and improved stability.

Rig based setup

Rig poses over time + Camera-in-rig extrinsics (Fixed)

Optimization objective

  • Minimize sum of squared reprojection error
  • For every observed keypoint we:
    • Project the 3D point into that camera.
    • Measure the pixel error

Pipeline

Adaptive Data Collection For High Fidelity Gaussian Splatting

Taru Rustagi, Juntao (Jordan) Zhang

Advised by Prof. Laszlo A. Jeni

Fujitsu Advisors: Koichiro Niinuma, Mose Sakashita

Next Steps / Future Work

Monocular Depth Estimation with Scale Alignment

Constrained SfM with Camera Grouping

  • Input: Single 360° video stream in equirectangular projection
  • Outputs (for 3D Gaussian Splatting reconstruction):
    • Sparse Point Cloud for Gaussian initialization
    • Cubemap Images for scene representation
    • Camera Poses with accurate 3D localization
    • Scale Aligned Depth for depth supervision

Results

GS for Unconstrained Photo Collections

3DGS

Splatfacto-W

Distractor Removal: Remove moving cars, pedestrians and other objects for better reconstruction using segmentation masks.

NVS using Diffusion: Augment training views after 15000 steps using diffusion based models (Difix3D+), and also use it as an inference model to remove GS artifacts.

Motivation and Prior Work

Motivation:

  • Ease of Capture: Simplify data collection while improving GS fidelity.
  • Robustness: Handle distractors, lighting changes, andscene variability.
  • Stability: Enable stricter convergence and more reliable reconstruction

Prior Work:

  • Equirectangular SLAM integrated with Splatfacto-W.

Issues:

  • SLAM drift leading to inaccurate camera trajectories.
  • High computational cost for large-scale or long-sequence captures.
  • Geometry deformation due to overfitting and weak regularization.