1
Towards Universal State Estimation and
Reconstruction in the Wild
Team 15 - Bhuvan Jhamb, Chenwei Lyu
Mentors - Nikhil Keetha, Dr. Sebastian Scherer
2
· Imagine what’s in the future…
What is common in all of these? - SLAM
Motivation
https://medium.com/@recogni/autonomous-vehicles-and-a-system-of-connected-cars-944f86275663
https://www.reuters.com/technology/apple-ramps-up-vision-pro-production-plans-february-launch-bloomberg-news-2023-12-20/
https://www.af.mil/News/Article-Display/Article/2551037/robot-dogs-arrive-at-tyndall-afb/
Autonomous Vehicle
AR/VR
Robotics
3
Motivation
· SLAM: Simultaneous Localization and Mapping
4
https://www.amazon.science/latest-news/how-zoox-vehicles-find-themselves-in-an-ever-changing-world
· Autonomous Vehicle
Motivation
5
Motivation
· Autonomous Vehicle
· AR/VR
https://mobilesyrup.com/2023/06/05/apple-unveils-mixed-reality-headset-wwdc-2023/
6
· Autonomous Vehicle
· AR/VR
· Robotics
Motivation
https://m.kangnamtimes.com/tech/article/73975/
7
· What is common in all of these?
Motivation
https://medium.com/@recogni/autonomous-vehicles-and-a-system-of-connected-cars-944f86275663
https://www.reuters.com/technology/apple-ramps-up-vision-pro-production-plans-february-launch-bloomberg-news-2023-12-20/
https://www.af.mil/News/Article-Display/Article/2551037/robot-dogs-arrive-at-tyndall-afb/
8
Motivation
· Why Are Existing Methods Insufficient?
Hence this leads to our project:
Towards Universal State Estimation and Reconstruction in the Wild
9
Literature Review
10
11
A scene can be represented in an explicit manner as a collection of 3d gaussians
Image Source: https://rmurai.co.uk/projects/GaussianSplattingSLAM/
Gaussian Splatting: Brief Review
12
13
RGB Images
GT Poses
Initial PCL
3D Gaussians�representing the scene
Offline
3D Gaussian �Splatting
Poses
Incremental RGB-D Frames
3D Gaussians�representing the scene
Online
SplaTAM
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
14
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Gaussian Splatting meets SLAM
15
Assumptions
Gaussians are modelled as
With these assumptions each gaussian has 8 parameter:
(compared to 59 in original 3DGS)
Image source (link)
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
16
How SplaTAM works:
Initialization
For every new frame:
Map Update
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
17
Initialization
How SplaTAM works:
Initialization
For every new frame:
Map Update
Camera Pose:
Map Initialization:�
For first frame, for each pixel - we add a new gaussian with the following parameters
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
18
Tracking
How SplaTAM works:
Initialization
For every new frame:
Map Update
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
19
Gaussian Densification
How SplaTAM works:
Initialization
For every new frame:
Map Update
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
20
Map Update
How SplaTAM works:
Initialization
For every new frame:
Map Update
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
21
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
22
Limitations/Potential Improvements:
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
23
Proof of concept that 3D Gaussians can be a useful representation for Dense SLAM
Why This Paper
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
24
Some Progress:
Achieved ~1.3x improvement in fps - by replacing pytorch autodiff with a custom CUDA kernel
Just making implementation more efficient won’t help
We are exploring algorithmic changes to overcome the limitations of SplaTAM
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
25
Literature Review
26
27
AnyLoc: Towards Universal Visual Place Recognition
Humans & Robots alike need to know where they are
for Scene Understanding & Navigation
28
AnyLoc: Towards Universal Visual Place Recognition
29
AnyLoc: Towards Universal Visual Place Recognition
Current SOTA: Perform well in Training Distribution (Urban)
Do not generalize to diverse conditions
30
AnyLoc: Towards Universal Visual Place Recognition
AnyLoc Solution:
· Use Intermediate Features from Self-Supervised ViT
31
AnyLoc: Towards Universal Visual Place Recognition
AnyLoc Solution:
· Use Intermediate Features from Self-Supervised ViT
· Unsupervised Local Feature Aggregation
32
AnyLoc: Towards Universal Visual Place Recognition
AnyLoc Results:
· Achieves up to 4X wider performance
33
AnyLoc: Towards Universal Visual Place Recognition
Visually Degraded Environment (Hawkins)
500 Km Aerial Dataset (VP-Air)
34
Why This Paper:
Some Progress:
AnyLoc: Towards Universal Visual Place Recognition
35
Literature Review
36
(Dense and Unconstrained Stereo 3D Reconstruction)
DUSt3R: Geometric 3D Vision Made Easy
37
From these 2 images alone, can we infer
DUSt3R can!!
DUSt3R: Geometric 3D Vision Made Easy
38
DUSt3R: Geometric 3D Vision Made Easy
39
Classical Structure From Motion Pipeline:
Image source (link)
DUSt3R: Geometric 3D Vision Made Easy
40
The DUSt3R Pipeline
DUSt3R: Geometric 3D Vision Made Easy
41
The DUSt3R Pipeline
(r,g,b)
(x,y,z)
Image
Pointmap
DUSt3R: Geometric 3D Vision Made Easy
42
The DUSt3R Pipeline
DUSt3R: Geometric 3D Vision Made Easy
43
The DUSt3R Architecture - Pretraining
Reference Image
Image from a 2nd viewpoint but masked
CroCo Prediction
Source: CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
DUSt3R: Geometric 3D Vision Made Easy
44
The DUSt3R Architecture - Details
DUSt3R: Geometric 3D Vision Made Easy
45
Experiments - Minimal Overlap
DUSt3R: Geometric 3D Vision Made Easy
46
DUSt3R: Geometric 3D Vision Made Easy
47
Experiments - Cross View
DUSt3R: Geometric 3D Vision Made Easy
48
Experiments - Cross View
DUSt3R: Geometric 3D Vision Made Easy
49
The DUSt3R Architecture - Scaling to >2 images
The architecture can be extended to support multiple images at once by
DUSt3R: Geometric 3D Vision Made Easy
50
The DUSt3R Architecture - Scaling to >2 images
DUSt3R: Geometric 3D Vision Made Easy
51
DUSt3R - Failure Cases (Illumination Changes)
source (link)
DUSt3R: Geometric 3D Vision Made Easy
52
DUSt3R - Failure Cases (Humans)
DUSt3R: Geometric 3D Vision Made Easy
53
DUSt3R - Failure Cases (Long term Sequential Data)
DUSt3R: Geometric 3D Vision Made Easy
54
Why This Paper
DUSt3R: Geometric 3D Vision Made Easy
55
SplaTAM:
Proof of concept that 3D Gaussians can be a very useful representation for Dense SLAM
AnyLoc:
Insights into utilizing ViT features for robust universal SLAM
�DUSt3R:
An influential way to bake in geometry in a deep neural network
Conclusion
56
Featuremetic SLAM
Feed Forward SLAM
Current Work Directions
57
Nikhil
Keetha
Jay
Karhade
Sebastian Scherer
Sourav Garg
Akash
Sharma
Shibo Zhao
Yao He
Thanks to Mentors and Collaborators
58
Thanks For Listening!