1 of 13

Javed Ahmad^1,2 , Matteo Toso², Matteo Taiana², Stuart James², Alessio Del Bue²

Multi-view 3D Objects Localization

From Street-Level Scenes

Published at ICIAP 2022

¹Università degli Studi di Genova

²Istituto Italiano Di Tecnologia

2 of 13

3D Scene Perception

Automate Mapping [1]

Augmented Reality [2]

Autonomous Driving [3]

Robot Vision System [4]

[1] Mapillary.com

[2] MEMEXProject.eu

[3] Motional.com

[4]Universal-Robots.com

3 of 13

3D Scene Perception – Multi-View Scenario

Challenges:

Crowdsourced
Varying camera positions
Illumination changes
Object scale

4 of 13

Multi-view 3D Object Localization – The Problem

“ Localize street-level static objects in 3D scene as perceived by multiple camera images ”[1]

Objects:

Benches, Street-signs, street-lights, traffic-lights & signs, Trash-cans, Lamppost Manholes, etc.

Source of Data:

Crowdsourced , a service by Mapillary[2]

Challenges:

Unknown cameras
Illumination changes
Varying camera positions
Different scale of objects

[1] Ahmad, J., Toso, M., Taiana, M., James, S., & Del Bue, A. (2022, May). Multi-view 3d objects localization from street-level scenes. In International Conference on Image Analysis and Processing (pp. 89-101). Cham: Springer International Publishing.

[2] https://www.mapillary.com/

5 of 13

Multi-view 3D Object Localization – The Dataset

Dataset: Mapillary Street-level Scenes

API: Mapillary Python SDK [1]

Access Data: Define Radius , Number of Images

[1] Mapillary Python SDK, mapillary api v4. https://www.mapillary.com/developer/apidocumentation, accessed: 2021-12-15

Barcelona Center

London Center

Paris Center

Genova Porto Antico

Mapillary Meta Data:

Instance Segmentation
GPS Position

6 of 13

Multi-view 3D Object Localization – The Preprocessing

How to use these crowd-sourced multiple images for 3D objects localization?

We need to know the exact positions and orientations of the cameras in the scene!

We reconstructed the scene using structure from motion (SfM) [1]

Porto Antico, Genova

Piazza Corvetto, Genova

Vienna Central Area

[1] Schonberger, Johannes L., and Jan-Michael Frahm. "Structure-from-motion revisited." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

7 of 13

Genova

Dataset: Mapillary

Metainfo: Instance Segmentation

Step 1: Preprocessing

Step 2: Estimating 3D Objects

Refined

3D points

SOR

2D Dets.

Filter Static Objects

2D Polygons

2D Bounding Boxes

Access Objects’ 3D Points and Metainfo

Step 3:

Matching 2D Detections

Multi-view 3D Object Localization – Proposed Approach

Sparse Reconstruction

Meta Data

Scene Data

Camera Parameters

8 of 13

Porto Antico, Genova

Multi-view 3D Object Localization – Visualizations

9 of 13

A Scene From Vienna

Traffic Sign

Street light

Traffic Sign

Store sign

Multi-view 3D Object Localization – Visualizations

10 of 13

Multi-view 3D Object Localization – Visualizations

Performance if there is occlusion

Performance if object changes the texture

11 of 13

Multi-view 3D Object Localization – Failure vs Best

Failure Case

Best Case

12 of 13

Metrics

3D Intersection Over Union (3D IOU)

2D Intersection Over Union (2D IOU)

Measures

3D:- TP, FP and FN to compute precision and recall

2D:- TP, FP and FN to compute mean average precision (mAP)

Comparison

Mapillary’s method of objects derived from multiple detections in multiple images [1]

[1] Mapillary Python SDK, mapillary api v4. https://www.mapillary.com/developer/apidocumentation, accessed: 2021-12-15

TP: True Positive FP: False Positive FP: False Negative

Projected

G.T 3D Box

2D Detection

Multi-view 3D Object Localization – The Evaluation

13 of 13

Conclusions

We addressed the challenge of 3D object localization by integrating sparse reconstruction with 2D detection information, yielding significant advancements with vital implications for augmented reality.