JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 26

What You See is What You Get: Exploiting Visibility for 3D Object Detection

Peiyun Hu, Jason Ziglar, David Held, Deva Ramanan �CVPR 2020

Nicholas Vadivelu

2020/07/07

2 of 26

Motivation

LiDAR sweeps are inherently “2.5D”

We can never see behind occluded objects (as with many sensor types)
This is why we can represent them in 2D data structures (e.g. 2D range images)

Representing as 3D point clouds (sets of (x, y, z)) hides this information
Occupancy maps have not been used for object detection

3 of 26

Contributions

(Re)Introduce raycasting algorithms to efficiently compute visibility for a voxel grid (on the fly)
Show that the visibility can be combined with synthetic data augmentation and temporal aggregation of LiDAR sweeps
Approach to augment voxel-based networks with visibility

4 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

5 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

6 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

7 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

8 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

9 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

10 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

11 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

12 of 26

Ray Casting Overview (2D case)

# V is visibility: a multichannel 2D feature map �V[:] <- UNKNOWN

for each LiDAR Point (a, b, c):� x, y, z <- source� while (x, y, z) != (a, b, c):� V[x, y, z] <- FREE� x, y, z <- next voxel on ray� V[a, b, c] <- BLOCKED

N LiDAR points, Map Dimensions (l, w, h):�Time Complexity: O(N*max(l, w, h))

13 of 26

Object Augmentation

Copy-pastes of rarely seen objects into LiDAR scenes
Want to avoid putting objects in occluded areas:

14 of 26

Temporal Aggregation

Aggregate sweeps from different time points (compensating for motion)
Use Bayesian filtering to turn the 4D spatial-temporal visibility into a 3D posterior probability of occupancy

Follow OctoMap’s (Hornung et al. 2013) formulation

15 of 26

Approach: A Two-stream Network

Augment PointPillars architecture

16 of 26

Experiments

Dataset: NuScenes 3D detection dataset

1000 scenes captured in 2 cities
Training set: 700 scenes (28,130 annotated frames)
Validation Set: 150 Scenes (6,019 annotation frames)

17 of 26

Ablation: Late vs Early Fusion

18 of 26

Ablation: Types of Object Augmentation

19 of 26

Ablation

20 of 26

Ablation: Object Augmentation

21 of 26

Ablation: Temporal Aggregation

22 of 26

Ablation: Visibility Stream

23 of 26

24 of 26

Related Work: Visibility

The Mobile Robot RHINO

Joachim Buhmann, Wolfram Burgard, Armin B. Cremers, Dieter Fox, Thomas Hofmann, Frank E. Schneider, Jiannis Strikos and Sebastian Thrun 1995
Use 2D probabilistic occupancy map from sonar readings for navigation

Octomap: An efficient probabilistic 3d mapping framework based on octrees

Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, Wolfram Burgard
General purpose 3D occupancy mapping

A Probabilistic Representation of LiDAR Range Data for Efficient 3D Object Detection

Theodore C. Yapo, Charles V. Stewart, and Richard J. Radke 2008
Formulates object detection as a hypothesis testing problem

25 of 26

Thoughts

Runtime? (they mention 24.4±3.5ms on Intel i9)
Intelligent object augmentation works great
Not clear how the probabilistic aggregation over time is different than naively combining over multiple time steps
Application to V2VNet:

V2VNet gets a lot of benefit from nearby vehicles having visibility in occluded areas
The GNN aggregates incoming messages via a mean
Including this visibility information with a learned aggregation (e.g. weights) could be interesting

26 of 26

Thanks for Listening!