1 of 1

Mismatch between Classification and Localization Scores
Previous methods do not address this in an end-to-end manner.
Our network gets an additional signal when the best localized box does not have the highest score.

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection

Abhinav Kumar, Garrick Brazil, Xiaoming Liu

Computer Vision Lab, Department of CSE, Michigan State University (MSU)

Why Differentiable NMS?

Do not use NMS in Training
Mismatch between training and inference pipelines.

6. Experiments and Results on KITTI

Experiment Setup:

Addresses the mismatch

of training and inference pipelines in object detection.
between classification and localization scores in 3D object detection.

2. Issues with Previous Methods

Figure: Conventional NMS Pipeline.

3. Proposed GrooMeD-NMS

Figure: GrooMeD-NMS Layer.

Non-Differentiable Argmax of Scores → Sort Scores in Descending Order�
Greedy Updating → Generalized to All Boxes

Grouping Boxes
Cluster boxes belonging to same object
Masking of Overlap Matrix
Mask the overlap matrix as in Classical NMS
Threshold Pruning → Linear Pruning
Non-zero gradients.

[1] Simonelli et al, Disentagling monocular 3D object detection, ICCV 2019

[2] Simonelli et al, Towards generalization across depth for monocular 3D object detection, ECCV 2020

[3] Ding et al, Learning depth-guided convolutions for monocular 3D object detection, CVPR Workshops 2020

[4] Shi et al, Distance-normalized unified representation for monocular 3D object detection, ECCV 2020

[5] Prokudin et al, Learning to filter object detections, GCPR 2017

[6] Bodla et al, Soft-NMS: Improving object detection with with one line of code, ICCV 2017

[7] Brazil et al, Kinematic 3D object detection in monocular video, ECCV 2020

[8] Brazil et al, M3D-RPN: Monocular 3D region proposal network for object detection, ICCV 2019

References:

Model	Easy	Med	Hard
MonoDIS [1]	10.37	7.94	6.40
MoVi-3D [2]	15.19	10.90	9.26
D4LCN [3]	16.65	11.72	9.51
Kinematic (Video) [7]	19.07	12.72	9.17
GrooMeD-NMS (Ours)	18.10	12.32	9.65

7. Conclusion and Future Work

Present and integrate a differentiable NMS for monocular 3D object detection.
Future work: apply to LiDAR-based 3D object detection, and pedestrian detection.

4. Changes from Classical NMS → GrooMeD-NMS

Best Box After NMS
Extend definition of best 2D box [5] to best 3D box.
Loss Function After NMS
Use ranking loss instead of classification loss.��
Final Loss
Sum of losses before and after NMS.

5. Loss Functions

Score-IOU_3D Plot:

AP_3D-Threshold Plot:

Model	Easy	Med	Hard
MonoDIS [1]	11.06	7.60	6.37
MoVi-3D [2]	14.28	11.13	9.68
Kinematic (Image) [7]	18.28	13.55	10.13
Kinematic (Video) [7]	19.76	14.10	10.47
GrooMeD-NMS (Ours)	19.67	14.32	11.27

Model	Easy	Med	Hard
M3D-RPN [8]	14.57	10.07	7.51
Kinematic (Image) [7]	13.54	10.21	7.24
GrooMeD-NMS (Ours)	14.72	10.87	7.67

KITTI Val 1 Results:

KITTI Val 2 Results:

Split Full : Train/Test = 7481/7518 images
Split Val1: Train/Test = 3712/3769 images

Split Val2: Train/Test = 3682/3799 images
Metrics: AP_3D (↑) at IOU_3D= 0.7

Model	Inference NMS	Easy	Med	Hard
GrooMeD-NMS	Classical	19.07	14.31	11.27
GrooMeD-NMS	Soft [6]	19.67	14.31	11.27
GrooMeD-NMS	Distance [4]	19.67	14.31	11.27
GrooMeD-NMS	GrooMeD	19.67	14.32	11.27

KITTI Full Results:

Comparison with other NMS:

Support

Project Website

Code

Demo

Figure: GrooMeD-NMS Pipeline.

Qualitative Results:

Figure: Pruning Functions.