1 of 28

Semantic Understanding of road scene

Yu-Hsuan & Qibang

Yu-Hsuan & Qibang

Advisor: Prof. Dong Huang, Ryan Lingo

Advisor: Prof. Dong Huang, Ryan Lingo

Semantic Understanding of road scene

2 of 28

Outline

  • Part 1: Capstone Project
    • Motivation
    • Background Knowledge
    • Dataset
    • Backbone
    • Possible Directions
  • Part 2: Paper Survey - SOTA Models
    • 360KITTI dataset SOTA model
    • nuScenes dataset SOTA model
    • Waymo dataset SOTA model

2

3 of 28

Motivation

3

4 of 28

Preliminary

4

Three different segmentation

5 of 28

Dataset

    • SemanticKITTI

    • Kitti360

5

6 of 28

  • Nuscene
    • Two diverse cities: Boston and Singapore
  • Waymo
    • Panoramic images

6

7 of 28

Backbone

7

Input

Output

Fusion

8 of 28

Advantages of using 2D image and 3D LiDAR as input

8

9 of 28

Possible directions to improve model robustness

  • Data Augmentation
    • Add more data produced by stable diffusion model + domain consistency (light and shadow)

9

10 of 28

Possible Solution - Data Augmentation

10

Repopulating Street Scenes

Ref: Repopulating Street Scenes Wang et. al. (CVPR2021)

11 of 28

11

12 of 28

Possible directions to improve model robustness

  • Data Augmentation
    • Add more data produced by stable diffusion model + domain consistency (light and shadow)
  • Utilize different input data
    • KITTI also provide 360 image / panorama which contains global and local information

12

13 of 28

Possible directions to improve model robustness

  • Data Augmentation
    • Add more data produced by stable diffusion model + domain consistency (light and shadow)
  • Utilize different input data
    • KITTI also provide 360 image / panorama which contains global and local information
  • Ease the gap between 2D and 3D data
    • How to fuse feature from different data type (images/ points cloud/ panorama)

13

14 of 28

How to fuse feature from 2D/3D - SemanticKITTI (MSFSKD)

14

15 of 28

Possible directions to improve model robustness

  • Data Augmentation
    • Add more data produced by stable diffusion model + domain consistency (light and shadow)
  • Utilize different input data
    • KITTI also provide 360 image / panorama which contains global and local information
  • Ease the gap between 2D and 3D data
    • How to fuse feature from different data type (images/ points cloud/ panorama)
  • Cope with dangerous scene: Fast moving object
    • Optical flow

15

16 of 28

Paper Survey - Current SOTA Model

16

17 of 28

Evaluation Metric

  • Intersection over Union (IoU)
  • Dice Coefficient, Pixel Accuracy, and Mean Accuracy Metrics (not object level)

17

Reference: https://learnopencv.com/intersection-over-union-iou-in-object-detection-and-segmentation/

https://pycad.co/the-difference-between-dice-and-dice-loss/

18 of 28

Reference: 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

19 of 28

19

Reference: 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

20 of 28

20

Reference: 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

21 of 28

21

Reference: Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation

22 of 28

22

Reference: Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation

23 of 28

23

Reference: Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation

24 of 28

24

Reference: Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation

25 of 28

25

Reference: Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation

26 of 28

Compare & Contrast

2DPASS / Cross-Modal Learning / Multi-View Aggregation

  • All three papers aim to hybrid 2D/3D data for more information
  • 2DPASS was inspired by Cross-Modal Learning
  • Pros / Cons

26

Unidirectional v.s. Bidirectional

27 of 28

Conclusion

Improving the robustness of self-driving cars in unexpected situations

  • Identify rare conditions
  • Data Augmentation
  • Increasing/ Changing input data
  • Method of easing gap between 2D/ 3D data
  • Cope with fast moving objects

27

28 of 28

Questions?

28