1 of 1

  1. Mismatch between Classification and Localization Scores
  2. Previous methods do not address this in an end-to-end manner.
  3. Our network gets an additional signal when the best localized box does not have the highest score.

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection

Abhinav Kumar, Garrick Brazil, Xiaoming Liu

Computer Vision Lab, Department of CSE, Michigan State University (MSU)

  1. Why Differentiable NMS?
  • Do not use NMS in Training
  • Mismatch between training and inference pipelines.

6. Experiments and Results on KITTI

Experiment Setup:

Addresses the mismatch

  • of training and inference pipelines in object detection.
  • between classification and localization scores in 3D object detection.

2. Issues with Previous Methods

Figure: Conventional NMS Pipeline.

3. Proposed GrooMeD-NMS

Figure: GrooMeD-NMS Layer.

  1. Non-Differentiable Argmax of Scores → Sort Scores in Descending Order�
  2. Greedy Updating → Generalized to All Boxes

  • Grouping Boxes
  • Cluster boxes belonging to same object
  • Masking of Overlap Matrix
  • Mask the overlap matrix as in Classical NMS
  • Threshold Pruning → Linear Pruning
  • Non-zero gradients.

[1] Simonelli et al, Disentagling monocular 3D object detection, ICCV 2019

[2] Simonelli et al, Towards generalization across depth for monocular 3D object detection, ECCV 2020

[3] Ding et al, Learning depth-guided convolutions for monocular 3D object detection, CVPR Workshops 2020

[4] Shi et al, Distance-normalized unified representation for monocular 3D object detection, ECCV 2020

[5] Prokudin et al, Learning to filter object detections, GCPR 2017

[6] Bodla et al, Soft-NMS: Improving object detection with with one line of code, ICCV 2017

[7] Brazil et al, Kinematic 3D object detection in monocular video, ECCV 2020

[8] Brazil et al, M3D-RPN: Monocular 3D region proposal network for object detection, ICCV 2019

References:

Model

Easy

Med

Hard

MonoDIS [1]

10.37

7.94

6.40

MoVi-3D [2]

15.19

10.90

9.26

D4LCN [3]

16.65

11.72

9.51

Kinematic (Video) [7]

19.07

12.72

9.17

GrooMeD-NMS (Ours)

18.10

12.32

9.65

7. Conclusion and Future Work

  • Present and integrate a differentiable NMS for monocular 3D object detection.
  • Future work: apply to LiDAR-based 3D object detection, and pedestrian detection.

4. Changes from Classical NMS → GrooMeD-NMS

  • Best Box After NMS
  • Extend definition of best 2D box [5] to best 3D box.
  • Loss Function After NMS
  • Use ranking loss instead of classification loss.��
  • Final Loss
  • Sum of losses before and after NMS.

5. Loss Functions

Score-IOU3D Plot:

AP3D-Threshold Plot:

Model

Easy

Med

Hard

MonoDIS [1]

11.06

7.60

6.37

MoVi-3D [2]

14.28

11.13

9.68

Kinematic (Image) [7]

18.28

13.55

10.13

Kinematic (Video) [7]

19.76

14.10

10.47

GrooMeD-NMS (Ours)

19.67

14.32

11.27

Model

Easy

Med

Hard

M3D-RPN [8]

14.57

10.07

7.51

Kinematic (Image) [7]

13.54

10.21

7.24

GrooMeD-NMS (Ours)

14.72

10.87

7.67

KITTI Val 1 Results:

KITTI Val 2 Results:

  • Split Full : Train/Test = 7481/7518 images
  • Split Val1: Train/Test = 3712/3769 images
  • Split Val2: Train/Test = 3682/3799 images
  • Metrics: AP3D (↑) at IOU3D = 0.7

Model

Inference NMS

Easy

Med

Hard

GrooMeD-NMS

Classical

19.07

14.31

11.27

GrooMeD-NMS

Soft [6]

19.67

14.31

11.27

GrooMeD-NMS

Distance [4]

19.67

14.31

11.27

GrooMeD-NMS

GrooMeD

19.67

14.32

11.27

KITTI Full Results:

Comparison with other NMS:

Support

Project Website

Code

Demo

Figure: GrooMeD-NMS Pipeline.

Qualitative Results:

Figure: Pruning Functions.