1 of 25

Open-World Panoptic LiDAR Segmentation

Students: Meghana Ganesina, Anirudh Chakravarthy

Advisors: Aljosa Osep, Deva Ramanan, Shu Kong

MSCV capstone project overview, January 2022

2 of 25

Mobile robot perception

2

3 of 25

LiDAR-based mobile robot perception

3

4 of 25

Motivation

  • Prior work
    • Static models, do not adapt their behaviour over time
    • New data: we would need to manually label and restart the training process
  • Would this setting generalize to the real-world?

5 of 25

Our work

Continual learning for LiDAR panoptic segmentation via object discovery

  • Robot drives around and collects streams of LiDAR sensory data
  • At the end of the day: re-consolidate this data (offline).
    • Did we observe any “novel” object classes that we do not recognise?
    • If so: ask humans about this class (human-in-the-loop) / pseudo-labels (clustering)
    • Incrementally update the system

6 of 25

Open Set Vs Closed Set

6

7 of 25

Datasets

  • Semantic KITTI, Panoptic NuScenes

Semantic KITTI (Behley et al., ICCV’19)

Panoptic nuScenes (Fong et al., arxiv:2109.03805, ‘21)

8 of 25

4D Panoptic LiDAR segmentation

S

Semantic head

O

Objectness head

Σ

Point variance head

ε

Point embeddings

t

t+1

t+2

Point sampling

S

O

Σ

ε

Encoder-Decoder Network

4D Semantic + Instance Predictions

4D Point Cloud

Aygun et al, 4D Panoptic LiDAR Segmentation, CVPR 2021.

9 of 25

LiDAR Panoptic Segmentation (single-scan)

Single-scan LiDAR Panoptic Segmentation (Behley et al., ICCV’19, ICRA’21)

Semantic Segmentation

RangeNet(++) - Millioto et al., IROS’19

KPConv - Thomas et al., CVPR’19

Object Detection

PointPillars - Lang et al., CVPR’19

10 of 25

Towards Open World Object Detection

K J Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian

CVPR’21

11 of 25

Open world recognition

  • Bendale et al., Towards open world recognition, CVPR’16
  • Key idea:
    • Recognize `novel` classes
    • Label and re-train

12 of 25

Open world detection

The premise:

  • Detector will detect and classify “known” object classes
  • The remaining anchor boxes will be detected/classified as “unknown” objects or rejected as “stuff” or “background” classes
  • Those detected “unknown” objects will be labeled by human annotators and used to (incrementally) re-train the detector

12

13 of 25

Method

  • Maximize discrimination between latent representations in the feature space
  • How? Contrastive learning!
  • Intuition: by “pushing apart” features for (distinct) “known” classes and “unknown” classes, it will become easier to identify unknown classes as novel

class prototype

feature vector

14 of 25

Experiments

Catastrophic forgetting

Known-unknown confusion

Precision for known

Precision for known + unknown

15 of 25

Method

  • Base: Faster R-CNN
  • Add additional (contrastive) loss that “pulls apart” features in the embedding space (N-known + unknown)

15

class prototype

feature vector

16 of 25

Exemplar-based Open-Set Panoptic Segmentation Network

Jaedong Hwang, Seoung Wug Oh, Joon-Young Lee, Bohyung Han

CVPR’21

17 of 25

Open-world panoptic segmentation

  • Panoptic segmentation: instance + semantic segmentation
  • Open-world panoptic segmentation: remove % of labels for certain classes and call them ‘unknown’

18 of 25

Method

  • Based on Panoptic FPN two-stage top-down approach [Kirrilov et al., CVPR’19]
  • Sample proposals
    • Criterion: > 50% of the area in the ‘void’ area.
    • Based on the ‘objectness’ score, extract features (1024-dim)
  • Clustering
    • Every 200 iter, k-means
    • Pick ‘good’ clusters: high average objectness score, small average cosine distance between centroid and elements -> store them
  • Mining
    • Find exemplars in the incoming mini-batch: compute cosine distance between proposals and exemplars (clusters) -> unknowns (use for supervision)
  • Standard cross-entropy loss + “negative supervision” term
    • Just treat mined classes as GT

19 of 25

Experiments

  • COCO dataset (COCO stuff)
  • Declare % of classes as ‘unknown’
    • 5%: car, cow, pizza, toilet
    • 10%: boat, tie, zebra, stop sign
    • 20%: table, banana, bicycle, cake, sink, cat, keyboard
  • Observation: due to “dense” GT, objects always somehow well delineated

20 of 25

Results

  • Utilizing the “void” boxes

  • Baselines

20

21 of 25

Roadmap

  • Instance mining - Jan 26
    • Define task sets, train models (supervised)
    • Instance segmentation (we are here)
    • Instance tracking
  • Instance grouping / discovery - Mar 15
    • Implement evaluation set-up (assume clusters are given)
    • Simple baseline (embeddings from a pre-trained network + clustering)
    • Self-supervised embedding learning + clustering
    • End-to-end model
  • Model re-training - Apr 15
    • Transfer labels clusters -> point clouds
    • Training from scratch, incremental update with replay

1st milestone

2nd milestone

3rd milestone

22 of 25

  1. Supervised Learning
  • Dataset: SemanticKITTI, Behley et al., ICCV’19, KITTI RAW (1.5h of “raw” recordings), Geiger et al., CVPR’12

Labeled

“Not labeled” (held-out)

Held-out instances to “discover”

Task set 0

road, building, vegetation, car, fence, human

sidewalk, truck, terrain, pole, parking, bicycle, traffic sign, motorcycle

Truck, pole, bicycle, traffic sign, motorcycle

Task set 1

road, building, vegetation, car, fence, human, sidewalk, truck, terrain, pole

parking, bicycle, traffic sign, motorcycle

Bicycle, traffic sign, motorcycle

Task set 2

road, building, vegetation, car, fence human, sidewalk, truck, terrain, pole, parking, bicycle, traffic sign, motorcycle

-

-

23 of 25

  • Supervised Learning
  • We can do K-way classification reasonably well! (SemanticKITTI validation split)

Task set 0

Task set 1

Task set 2

mIoU

mIoU_kn

IoU_unk

mIoU

mIoU_kn

IoU_unk

mIoU

mIoU_kn

IoU_unk

4D-Panoptic (single scan)

0.7383

0.7215

0.8388

0.7377

0.7594

0.5214

0.6260

0.6260

N/A

24 of 25

2. Instance mining

  • Goal: stream of raw sensory data => a set of potential objects
  • Per-scan proposal generation
    • Hu et al., Learning to Optimally Segment Point Clouds, RAL’19
    • Instead of the learned regressor average “objectness score” (estimated by the network) across each segment

24

25 of 25

Thank you!

25