Javed Ahmad1,2 , Matteo Toso2, Matteo Taiana2, Stuart James2, Alessio Del Bue2
Multi-view 3D Objects Localization
From Street-Level Scenes
Published at ICIAP 2022
1Università degli Studi di Genova
2Istituto Italiano Di Tecnologia
3D Scene Perception
Automate Mapping [1]
Augmented Reality [2]
Autonomous Driving [3]
Robot Vision System [4]
[1] Mapillary.com
[2] MEMEXProject.eu
[3] Motional.com
[4]Universal-Robots.com
3D Scene Perception – Multi-View Scenario
Challenges:
z
Multi-view 3D Object Localization – The Problem
“ Localize street-level static objects in 3D scene as perceived by multiple camera images ”[1]
Objects:
Benches, Street-signs, street-lights, traffic-lights & signs, Trash-cans, Lamppost Manholes, etc.
Source of Data:
Crowdsourced , a service by Mapillary[2]
Challenges:
[1] Ahmad, J., Toso, M., Taiana, M., James, S., & Del Bue, A. (2022, May). Multi-view 3d objects localization from street-level scenes. In International Conference on Image Analysis and Processing (pp. 89-101). Cham: Springer International Publishing.
[2] https://www.mapillary.com/
Multi-view 3D Object Localization – The Dataset
Dataset: Mapillary Street-level Scenes
API: Mapillary Python SDK [1]
Access Data: Define Radius , Number of Images
[1] Mapillary Python SDK, mapillary api v4. https://www.mapillary.com/developer/apidocumentation, accessed: 2021-12-15
Barcelona Center
London Center
Paris Center
Genova Porto Antico
Mapillary Meta Data:
Multi-view 3D Object Localization – The Preprocessing
How to use these crowd-sourced multiple images for 3D objects localization?
We need to know the exact positions and orientations of the cameras in the scene!
We reconstructed the scene using structure from motion (SfM) [1]
Porto Antico, Genova
Piazza Corvetto, Genova
Vienna Central Area
[1] Schonberger, Johannes L., and Jan-Michael Frahm. "Structure-from-motion revisited." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016
Genova
Dataset: Mapillary
Metainfo: Instance Segmentation
Step 1: Preprocessing
Step 2: Estimating 3D Objects
Refined
3D points
SOR
2D Dets.
Filter Static Objects
2D Polygons
2D Bounding Boxes
Access Objects’ 3D Points and Metainfo
Step 3:
Matching 2D Detections
Multi-view 3D Object Localization – Proposed Approach
Sparse Reconstruction
Meta Data
Scene Data
Camera Parameters
Porto Antico, Genova
Multi-view 3D Object Localization – Visualizations
A Scene From Vienna
Traffic Sign
Street light
Traffic Sign
Store sign
Multi-view 3D Object Localization – Visualizations
Multi-view 3D Object Localization – Visualizations
Performance if there is occlusion
Performance if object changes the texture
Multi-view 3D Object Localization – Failure vs Best
Failure Case
Best Case
Metrics
3D Intersection Over Union (3D IOU)
2D Intersection Over Union (2D IOU)
Measures
3D:- TP, FP and FN to compute precision and recall
2D:- TP, FP and FN to compute mean average precision (mAP)
Comparison
Mapillary’s method of objects derived from multiple detections in multiple images [1]
[1] Mapillary Python SDK, mapillary api v4. https://www.mapillary.com/developer/apidocumentation, accessed: 2021-12-15
TP: True Positive FP: False Positive FP: False Negative
Projected
G.T 3D Box
2D Detection
Multi-view 3D Object Localization – The Evaluation
Conclusions
We addressed the challenge of 3D object localization by integrating sparse reconstruction with 2D detection information, yielding significant advancements with vital implications for augmented reality.