1
Towards Universal State Estimation and
Reconstruction in the Wild
Team 15 - Bhuvan Jhamb, Chenwei Lyu
Mentors - Nikhil Keetha, Dr. Sebastian Scherer
2
· Imagine what’s in the future…
Motivation
https://medium.com/@recogni/autonomous-vehicles-and-a-system-of-connected-cars-944f86275663
https://www.reuters.com/technology/apple-ramps-up-vision-pro-production-plans-february-launch-bloomberg-news-2023-12-20/
https://www.af.mil/News/Article-Display/Article/2551037/robot-dogs-arrive-at-tyndall-afb/
Autonomous Vehicle
AR/VR
Robotics
What is common in all of these? - SLAM
3
Motivation
· SLAM: Simultaneous Localization and Mapping
4
Motivation
· Why Are Existing Methods Insufficient?
Hence this leads to our project:
Towards Universal State Estimation and Reconstruction in the Wild
5
Classical Structure From Motion Pipeline:
Image source (link)
Recap:
6
The DUSt3R Pipeline
DUSt3R: Geometric 3D Vision Made Easy
Recap:
(Dense and Unconstrained Stereo 3D Reconstruction)
7
The DUSt3R Pipeline
DUSt3R: Geometric 3D Vision Made Easy
Recap:
8
DUSt3R: Geometric 3D Vision Made Easy
Experiments - Minimal Overlap
Recap:
9
Experiments - Cross View
DUSt3R: Geometric 3D Vision Made Easy
Recap:
10
Generalizable and Robust SLAM for “In the Wild” Settings:
- in ill posed settings (very sparse views/pure rotation/planar data etc.)
- or challenging (extreme lightening variations etc.) scenarios
Directions Explored
Recap:
11
Scaling to >2 images
DUSt3R: Geometric 3D Vision Made Easy
Failure Cases (Long term Sequential Data)
12
Other works extending Dust3R to SLAM settings
Inherit Limitations of Dust3R….
13
DUSt3R fails on Out Of Distribution Data
DUSt3R: Geometric 3D Vision Made Easy
(Humans)
source (link)
14
Limitations of DUSt3R :
15
New models came out over the summer!
Dust3R
Geometrical
Matching
Feature-based
matching
Mast3R
16
Precise correspondences enable precise reconstruction!
Given precise correspondences- reconstruction is a geometrically grounded task
Instead of teaching the network to reason about geometry, we focus on learning precise correspondences, which can be utilized for reconstruction
17
Precise correspondences enable precise reconstruction!
18
Image Matching
Sparse
Matching
Dense
Matching
Generalised
Matching
e.g. GMFlow
19
19
19
Our solution : Match Anything
Match Anything
20
Architecture
Geometrical
Matching
Feature-based
matching
Mast3R
Match Anything
1. Simplify the architecture to be more generalizable
21
21
Architecture
2. Only predict flow in the covisible region
22
Co-visible Mask Generation
Project pixels from camera 1 into 3D space and then back to camera 2
FoV mask: If the coordinates of projected pixel is within image boundaries, then it’s in FOV
Occlusion mask: If the depth of projected pixel is close to the real depth of that pixel, then it’s visible
1. Accumulate Pointcloud:
From posed depth image, upproject into point cloud and accumulate.
23
Sampling Method
· To get image pairs, we need to design a sampling method.
2. Voxelize:
Voxel downsample point clouds & camera positions to create enumerable scene representation.
3. Calculate Covisibility:
For all camera position to all voxels, determine if the camera can see the voxel. Save to a list.
4. Generate Samples:
Randomly select a base camera and a target voxel, filter all candidate camera position that can have required angle with the base camera when looking at the target voxel.
Score all candidates based on visibility from the preprocessed covisibility list. Keep N candidates and add to pair list.
24
Reference ray
Sampling Method
25
Datasets
26
Results
Season
Day-Night
27
Results
Scale
Perspective
WxBS result
28
Tested: 90
58
0.69
Profiling Match Anything (On AGX Orin)
29
Batch Size | Time Taken in forward pass (ms) | Time taken per image (ms) |
1 | 92.12 | 92.12 |
2 | 103.54 | 51.72 |
4 | 186.96 | 46.74 |
8 | 349.27 | 43.65 |
~20 FPS Performance
30
Limitations
Extreme Dark
Comprehensive Change
31
Utilizing match anything for reconstruction
End to End approaches�Robust but not Generalizable enough
Very Modular approaches�Generalizable but not robust enough
Somewhere in between totally modular vs totally E2E
33
Utilizing match anything for reconstruction
Replace with Match Anything
34
Ongoing Work
Source - ChatGPT
35
Nikhil
Keetha
Jay
Karhade
Sebastian Scherer
Thanks to Mentors and Collaborators
Yuchen Zhang
Yutian
Chen
Yuheng Qiu
36
Thanks For Listening!