1 of 1

SeaBird: Segmentation in Bird’s View with Dice Loss Improves 3D Detection of Large Objects�Abhinav Kumar¹, Yuliang Guo², Xinyu Huang², Liu Ren², Xiaoming Liu¹

¹Michigan State University (MSU), ²Bosch Center for AI, Bosch Research North America

Large Object Detection is Harder

Support

Code

Demo

Project Website

Performance = function (Representation, Loss, Noise)

Noise Sensitivity and Dice Loss

No. Frontal detectors such as MonoDETR [3] / DEVIANT [4] fail even on balanced KITTI-360 dataset.

Is Data Scarcity the Real Reason?

Results

Conclusion

Training data scarcity [1].
Receptive Field [2]

SeaBird Pipeline

nuScenes

KITTI-360

Large Object Detection = Representation (Frontal / BEV) + Loss problem.
Frontal detectors even with transformers do NOT work.
BEV detectors sub-optimal, improved by noise-robust Dice loss.

[1] Zhu et al, Class-balanced grouping and sampling, CVPRW 19

[2] Wu et al, Waymo keynote talk, CVPRW 23

[3] Zhang et al, MonoDETR: Depth guided transformer for Mono3D, ICCV 23

[4] Kumar et al, DEVIANT: Depth Equivariant Network, ECCV 22

References:

KITTI-360 is a�nearly balanced (1:2) dataset

Dice loss is more noise robust than regression (L₁/L₂) losses. (Benefits large noise/objects)