Visual Odometry with Deep Learning
2022.8.19
Introduction
Image Sequence
Feature Detection
(SIFT/SURF/ORB…)
Feature Matching
Outlier Rejection
Motion Estimate
Calibration
Bundle Adjustment
Pose
Geometric Method
Deep Learning Method
Absolute Pose Generate
In the VO tasks, every image in the dataset corresponds to 3×4 [R | t], R denotes 3×3 rotation matrix, which represents the rotation matrix of the left camera coordinate system relative to the first frame of the scene. t denotes 3×1 translation matrix, which denotes the position of the left camera coordinate system in the first frame.
1. Transformation matrix:
Pose of first frame
Rotation matrix in 3D space:
Absolute Pose Generate
In result, absolute pose can be denoted as 6-D vector:
Translation Vector
Relative Pose Generate
world coordinate
camera coordinate
Relative Pose Schema
2. Euler angles:
Relative pose of two adjacent frames can be denoted as 6-D vector:
displacement vector
Network Architectures
1. CNN Layers:
Use FlowNetSimple pretrained model to learn the geometric features between two adjacent frames.
2. LSTM Layers:
Let the network learn the relationship between multiple successive poses. Because sometimes the difference between adjacent poses is small.
Cost Function
KITTI VO Benchmark
Autonomous Driving Platform
KITTI Odometry sequence 00-10
Trajectories
Ground Truth of sequence 00-10
Trajectories
nuScenes
nuScenes car setup
nuScenes schema
Use inverse of pose matrix to transform data from global coordinate to camera coordinate.
Trajectories
Ground Truth of scenes 0-849
Experiments
KITTI Loss
NuScenes Loss
Experiments
Epoch=10 | Translation RMSE | Rotation RMSE |
01 | 506.189331 | 2.378539 |
03 | 72.12484 | 0.510368 |
05 | 79.494957 | 2.086924 |
07 | 17.149811 | 2.947729 |
09 | 40.579418 | 1.762159 |
mean | 143.107671 | 1.9371438 |
Test on KITTI Sequences
Epoch=120 | Translation RMSE | Rotation RMSE |
01 | 528.226257 | 0.981712 |
03 | 53.838741 | 0.453822 |
05 | 86.06134 | 1.83037 |
07 | 13.937366 | 2.656275 |
09 | 57.504028 | 2.550753 |
mean | 147.913546 | 1.694586 |
Experiments
K = 50 | Translation RMSE | Rotation RMSE |
01 | 526.587524 | 3.802394 |
03 | 22.536444 | 0.869126 |
05 | 42.410206 | 2.308919 |
07 | 37.448059 | 0.890943 |
09 | 31.28471 | 2.076861 |
mean | 132.053389 | 1.9896486 |
Test on KITTI Sequences (Epoch=10)
K = 100 | Translation RMSE | Rotation RMSE |
01 | 506.189331 | 2.378539 |
03 | 72.12484 | 0.510368 |
05 | 79.494957 | 2.086924 |
07 | 17.149811 | 2.947729 |
09 | 40.579418 | 1.762159 |
mean | 143.107671 | 1.9371438 |
Experiments
Comparison between K=50 with K=100 (Epoch=10)
Experiments
Trajectories on test sequences 00-10 for KITTI
Experiments
Experiments
Trajectories on test scenes 0-849 for NuScenes
Conclusion
Drawbacks & Future