1 of 1

SeaBird: Segmentation in Bird’s View with Dice Loss Improves 3D Detection of Large ObjectsAbhinav Kumar1, Yuliang Guo2, Xinyu Huang2, Liu Ren2, Xiaoming Liu1

1Michigan State University (MSU), 2Bosch Center for AI, Bosch Research North America

Large Object Detection is Harder

Support

Code

Demo

Project Website

  • Performance = function (Representation, Loss, Noise)

Noise Sensitivity and Dice Loss

  • No. Frontal detectors such as MonoDETR [3] / DEVIANT [4] fail even on balanced KITTI-360 dataset.

Is Data Scarcity the Real Reason?

Results

Conclusion

  • Training data scarcity [1].
  • Receptive Field [2]

SeaBird Pipeline

nuScenes

KITTI-360

  • Large Object Detection = Representation (Frontal / BEV) + Loss problem.
  • Frontal detectors even with transformers do NOT work.
  • BEV detectors sub-optimal, improved by noise-robust Dice loss.

[1] Zhu et al, Class-balanced grouping and sampling, CVPRW 19

[2] Wu et al, Waymo keynote talk, CVPRW 23

[3] Zhang et al, MonoDETR: Depth guided transformer for Mono3D, ICCV 23

[4] Kumar et al, DEVIANT: Depth Equivariant Network, ECCV 22

References:

KITTI-360 is a�nearly balanced (1:2) dataset

  • Dice loss is more noise robust than regression (L1/L2) losses. (Benefits large noise/objects)

1

2

3

4

5

6