InHa LEE
Nice-slam: Neural implicit scalable encoding for slam.
CVPR 2022
Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys
Introduction
Simultaneous Localization and Mapping
Robot pose
Map points
(in Real time)
Introduction
RADAR
event camera
Ultrasound
LiDAR
Wheel encoder
infrared camera
GNSS
monocular camera
RGB-D camera
Photometric sensors
Introduction
event camera
infrared camera
monocular camera
RGB-D camera
Localization
Mapping
Photometric sensors
Introduction
nice slam 영상
nice slam 영상
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Map information
Mesh, Point clouds, Voxel...
Storage
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Map information
R
Storage
NeRF[2]
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
NeRF[2]
RGB-D camera
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
Ray
Camera origin:
Ray
Viewing direction:
iMAP
position
MLP
color
density
inter-sample dist:
Occupancy:
Transmittance
color
Depth
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
NeRF[2]
RGB-D camera
iMAP
position
MLP
color
density
color
Depth
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
NeRF[2]
RGB-D camera
iMAP
position
MLP
color
density
color
Depth
Mapping
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
Bound!
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
NeRF[2]
RGB-D camera
iMAP
position
MLP
color
density
color
Depth
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
NeRF[2]
RGB-D camera
iMAP
position
MLP
color
density
color
Depth
Update camera pose (10Hz)
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
NeRF[2]
RGB-D camera
iMAP
position
MLP
color
density
color
Depth
Tracking
Method
[1] Sucar, Edgar, et al. "iMAP: Implicit mapping and positioning in real-time." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.
Method
NICE SLAM
NICE SLAM
position
color
iMAP
position
MLP
color
density
iMAP
Occupancy
Method
NICE SLAM
position
color
Occupancy
Feature grid 에 속한 feature는 모두 32 dimension인 learnable parameter입니다.
Method
[1] Peng, Songyou, et al. "Convolutional occupancy networks." European Conference on Computer Vision. Springer, Cham, 2020.
Method
NICE SLAM
position
color
Occupancy
Feature grid 에 속한 feature는 모두 32 dimension인 learnable parameter입니다
Color�Decoder
Color
Method
Method
Occupancy
Color
Method
Occupancy
Color
Method
Method
0.16m
0.32m
mid
fine
Mid-level feature
High-level feature
Method
Method
부분만 보더라도 �나머지를 채울 수 있다..!
2m
Method
Method
0.16m
fine
Fully Connected
Color
Pretrained 되지 않았습니다!
Method
Method
Ray
Ray
Method
Ray
Ray
keyframe
현재 frame
Method
34
,
,
1. To select keyframe, get proportion of depth error smaller than threshold
2. If P is under a threshold , this frame is keyframe and added to the keyframe set
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
F1
F2
F0
F3
F4
F5
F6
Method
35
,
,
1. To select keyframe, get proportion of depth error smaller than threshold
2. If P is under a threshold , this frame is keyframe and added to the keyframe set
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
F1
F2
F0
F3
F4
F5
F6
Method
36
,
,
1. To select keyframe, get proportion of depth error smaller than threshold
2. If P is under a threshold , this frame is keyframe and added to the keyframe set
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
F1
F2
F0
F3
F4
F5
F6
Method
37
,
,
1. To select keyframe, get proportion of depth error smaller than threshold
2. If P is under a threshold , this frame is keyframe and added to the keyframe set
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
F1
F2
F0
F3
F4
F5
F6
Method
Method
Ray
Camera origin:
Ray
Viewing direction:
iMAP
position
MLP
color
density
inter-sample dist:
Occupancy:
color
Depth
Method
Ray
Camera origin:
Ray
Viewing direction:
Coarse-level Occupancy:
NICE SLAM
position
color
Occupancy
fine-level Occupancy:
Method
Ray
Camera origin:
Ray
Viewing direction:
Coarse-level Occupancy:
NICE SLAM
position
color
Occupancy
fine-level Occupancy:
Transformation
Method
Method
Geometric loss
Photometric loss
GT
Rendered image
Normalize
Method
Geometric loss
Photometric loss
GT
Rendered image
Normalize
Coarse-level Occupancy:
fine-level Occupancy:
Method
Geometric loss
Photometric loss
GT
Rendered image
Normalize
Method
0.16m
0.32m
mid
fine
Method
mid/fine-level
position
Occupancy
Fisrt stage
Second stage
Geometric loss
Coarse-level Occupancy:
fine-level Occupancy:
Mid,fine feature grid를 optimizationc
Method
mid/fine-level
position
Occupancy
Fisrt stage
Second stage
Geometric loss
Coarse level
position
Occupancy
Coarse-level Occupancy:
fine-level Occupancy:
Coarse Feature grid를 optimization
Mid,fine feature grid를 optimization
Method
Photometric loss
Coarse-level Occupancy:
fine-level Occupancy:
Fully Connected
Color
GT
Rendered image
Method
Coarse-level Occupancy:
fine-level Occupancy:
Loss
Rendering image
Get sample
Fed to network
NICE SLAM
position
color
Occupancy
Camera origin:
Viewing direction:
Transformation:
Method
Loss
: Keyframe들의 rotation, translation
:weighting factor
Method
Method
Method
Method
Method
Mid-&fine-level geometric and color mapping� Thread (Mapping)
Coarse-level geometric mapping Thread
Camera tracking Thread
Experiments
Dataset: Replica, ScanNet, TUM RGB-D
Dataset: Replica, ScanNet, TUM RGB-D
Dataset: Replica, ScanNet, TUM RGB-D
Experiments
Dataset: Replica, ScanNet, TUM RGB-D
Experiments
Dataset: Replica, ScanNet, TUM RGB-D
+ Depth estimation 성능이 크게 좋아짐
+ Reconsturction 성능 향상
- Memory comsuption
Experiments
Dataset: Replica, ScanNet, TUM RGB-D
Experiments
Dataset: Replica, ScanNet, TUM RGB-D
- 기존 explict paper들에 비해 떨어지는 성능
+ 기존 implicit paper들에 비해 크게 향상된 성능
Experiments
Dataset: Replica, ScanNet, TUM RGB-D
+ iMAP가 비교하였을 때 크게 향상된 Mapping 성능
Tracking: 200 samples
Mapping: 1000 samples
FLOPs: 3D point에 대해서 color, occupancy를 획득하는데 필요한 부동소수점 연산의 횟수
Conclusion
Conclusion
Q & A
질문
Tracking 시 camera pose intialization
Why 3-level Feature Grids?
Memory consumption이나 real-time capability를 고려해보았을 때 3개가 제일 좋다!
질문
질문
Why is the Mid-level Output not a Residual to the Coarse-level Output?
mid-fine과 달리 mid-coarse는 grid 크기의 차이가 너무 큽니다..
질문
Why is the Mid-level Output not a Residual to the Coarse-level Output?
mid-fine과 달리 mid-coarse는 grid 크기의 차이가 너무 큽니다..