1 of 49

Efficient Spatio-Temporal Processing of Event Data

Kexin Shi, Yifei Liu

Mathias Gehrig, Nico Messikommer, Davide Scaramuzza

Master Project�Robotics and Perception Group

Department of Informatics, University of Zurich

Institute of Informatics – Institute of Neuroinformatics

Kexin Shi & Yifei Liu – Robotics and Perception Group

1

2 of 49

Motivation

picture: https://github.com/TimoStoff/event_utils

Ways of processing events:

(a) Point-based

(b) Voxel-based

(c) Others: spiking neural networks, recurrent networks, etc.

(a) point-based

(b) voxel-based

(c) spiking neural network

Kexin Shi & Yifei Liu – Robotics and Perception Group

2

3 of 49

Related Work

a) Point-based:

EventNet[1]

b) Voxel-based:

Video to Events[2] classification, segmentation

c) Other methods:

E2VID[3] video reconstruction

E-RAFT[4] optical flow

[1] Y. Sekikawa, IEEE/CVF 2019 [2] D. Gehrig, CVPR 2020 [3] H. Rebecq, IEEE PAMI 2019 [4] M. Gehrig, 3DV 2021

Kexin Shi & Yifei Liu – Robotics and Perception Group

3

4 of 49

Overview of Models

  • Voxel-based 2D models.

e.g. 2D CNN (2D Resnet, 2D Unet)

  • Voxel-based 3D models.

e.g. Volumetric CNN (3D ResNet, 3D Unet)

  • Point-based 3D models.

e.g. PointNet, PointNet++.

  • Mixing voxel features and point features.

e.g. Point-Voxel CNN (PVCNN) [5]

[5]Liu, Zhijian, Haotian Tang, Yujun Lin and Song Han. “Point-Voxel CNN for Efficient 3D Deep Learning.” NeurIPS (2019).

Kexin Shi & Yifei Liu – Robotics and Perception Group

4

5 of 49

Point-Voxel CNN

Kexin Shi & Yifei Liu – Robotics and Perception Group

5

6 of 49

Task 1: Object Classification

Dataset: N-Caltech101.

Label: barrel

Label: anchor

https://www.garrickorchard.com/datasets/n-caltech101

Kexin Shi & Yifei Liu – Robotics and Perception Group

6

7 of 49

Task 2: Optical Flow Regression

Dataset: DSEC

Input

(events visualized as an image)

Optical Flow

https://dsec.ifi.uzh.ch/

Kexin Shi & Yifei Liu – Robotics and Perception Group

7

8 of 49

Research Questions

Kexin Shi & Yifei Liu – Robotics and Perception Group

8

9 of 49

RQ1: How to downsample effectively?

Kexin Shi & Yifei Liu – Robotics and Perception Group

9

10 of 49

Challenge: number of points

An anchor in N-Caltech101

200,000 events

A chair in ModelNet40

1,024 points

Kexin Shi & Yifei Liu – Robotics and Perception Group

10

11 of 49

RQ1: How to downsample effectively?

Kexin Shi & Yifei Liu – Robotics and Perception Group

11

12 of 49

RQ1: How to downsample effectively?

10 temporal bins,

threshold p>1

50 temporal bins,

threshold p>1

raw

random

10 temporal bins,

threshold p>2

50 temporal bins,

threshold p>2

Kexin Shi & Yifei Liu – Robotics and Perception Group

12

13 of 49

RQ1: How to downsample effectively?

Classification:

Regression:

Observation:

1) The less points, the faster speed, but also the lower performance.

2) With similar number of points, data-dependent downsample performs better than random sample.

Kexin Shi & Yifei Liu – Robotics and Perception Group

13

14 of 49

RQ2: Performances of different methods?

Kexin Shi & Yifei Liu – Robotics and Perception Group

14

15 of 49

RQ2: Performances of different methods?

RQ2: Performances of different methods?

Classification:

Regression:

Kexin Shi & Yifei Liu – Robotics and Perception Group

15

16 of 49

RQ2: Performances of different methods?

Classification:

Regression:

Kexin Shi & Yifei Liu – Robotics and Perception Group

16

17 of 49

RQ3: Is mlp part really useful?

Kexin Shi & Yifei Liu – Robotics and Perception Group

17

18 of 49

RQ3: Is mlp part really useful?

Classification:

Regression:

19 of 49

RQ3: Is mlp part useful?

Classification:

Regression:

Observation: Point-features generated by MLP is useless, even some negative effects. MLP structure can not extract useful information from events data.

Kexin Shi & Yifei Liu – Robotics and Perception Group

19

20 of 49

RQ3: Is mlp part useful?

Kexin Shi & Yifei Liu – Robotics and Perception Group

20

21 of 49

RQ4: Is devoxelization necessary?

Kexin Shi & Yifei Liu – Robotics and Perception Group

21

22 of 49

RQ4: Is devoxelization necessary?

Original

Remove Voxelization

Remove Voxelization and Devoxelization

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

22

23 of 49

RQ4: Is devoxelization necessary?

Regression:

W/o devoxelization

With devoxelization

Kexin Shi & Yifei Liu – Robotics and Perception Group

23

24 of 49

RQ4: Is devoxelization necessary?

Classification:

Regression:

Observation: Passing point-based features between different resolutions performs better than passing voxel-based features.

Kexin Shi & Yifei Liu – Robotics and Perception Group

24

25 of 49

RQ5: Dense vs Sparse?

Kexin Shi & Yifei Liu – Robotics and Perception Group

25

26 of 49

RQ5: Dense vs Sparse?

Kexin Shi & Yifei Liu – Robotics and Perception Group

26

27 of 49

RQ5: Dense vs Sparse?

Classification:

Regression:

Observation: By substituting the 3D convolution by sparse convolution, speed increases, but performance drops.

Kexin Shi & Yifei Liu – Robotics and Perception Group

27

28 of 49

Research Questions

Kexin Shi & Yifei Liu – Robotics and Perception Group

28

29 of 49

Conclusion

Outlook

1. Point-based models are not suitable for events

2. Point-Voxel: good performance, but slow

3. 3D voxel-based: either slower or worse performance

4. 2D voxel-based: a reasonable trade off between speed and performance

1. Finding a better way of downsample is necessary.

2. Use sparse convolution to increase speed.

Kexin Shi & Yifei Liu – Robotics and Perception Group

29

30 of 49

Take-Away Message

  • RQ1: How to downsample effectively?
    • The less points, the faster speed, but also the lower performance.
    • With similar number of points, data-dependent downsample performs better than random sample.
  • RQ2: What’s the performances of different methods?
    • Point-based method is inaccurate and slow.
    • Point-Voxel method has the best performance with sacrificing some speed.
    • Voxel-based method (3D method) is either bad at performance or speed. The most time-consuming part is 3D convolution.
    • Frame-based method (2D method) has a good trade-off between performance and speed.

Kexin Shi & Yifei Liu – Robotics and Perception Group

30

31 of 49

Take-Away Message

  • RQ3: Is MLP part useful?
    • No. MLP can only bring noises even we increase the number of layers.
    • Point features generated by MLP are not useful for events data.
  • RQ4: Is devoxelization necessary?
    • Yes. Compared with passing voxel-based features between different resolutions, passing point-based features has a distinct improvement in performance.
  • RQ5: Dense convolution or Sparse convolution?
    • Sparse convolution is faster, but with worse performance (within the pvcnn structure).

Kexin Shi & Yifei Liu – Robotics and Perception Group

31

32 of 49

Thanks!

Kexin Shi & Yifei Liu – Robotics and Perception Group

32

33 of 49

Backup Slide

Kexin Shi & Yifei Liu – Robotics and Perception Group

33

34 of 49

RQ1: How to downsample effectively?

Classification:

Accuracy

Mean #points

Speed(Instances/sec)

Raw Data

-

115,297

-

Sum 10, P>1

78.506

20,552

25.36

Random

76.897

23,059

20.76

Sum 50, P>1

74.253

13,597

27.72

Sum 10, P>3

67.125

4,167

47.12

Kexin Shi & Yifei Liu – Robotics and Perception Group

34

35 of 49

RQ1: How to downsample effectively?

Regression:

EPE

AE

1PE

2PE

3PE

Mean #Points

Speed(Instances/sec)

Raw Data

-

-

-

-

-

1,984,865

-

Sum 10, P>1

1.296

4.570

30.270

12.996

7.530

157,941

6.36

Random

1.562

4.973

40.546

18.404

10.442

198,486

5.36

Sum 10, P>3

1.895

7.049

50.891

24.02

13.788

23,545

7.44

Observation:

1) With similar number of points, data-dependent downsample performs better than random sample.

2) The less points, the faster speed, but also the lower performance.

Kexin Shi & Yifei Liu – Robotics and Perception Group

35

36 of 49

RQ2: Point vs. Voxel vs. Fuse Models

Classification: PVCNN

Kexin Shi & Yifei Liu – Robotics and Perception Group

36

37 of 49

RQ2: Point vs. Voxel vs. Fuse Models

Accuracy

#Parameters

Speed(Instances/sec)

PointNet++

51.954

1.8M

9.4

3D ResNet18

74.943

33.2M

58.16

2D ResNet34

76.092

21.3M

58.2

PVCNN

78.506

10.4M

25.36

Observation:

  1. Pure point-based method is slow and inaccurate.
  2. Point-Voxel method gets the best performance in all un-pretrained models.

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

37

38 of 49

RQ2: Point vs. Voxel vs. Fuse Models

Regression: PV-Unet

Kexin Shi & Yifei Liu – Robotics and Perception Group

38

39 of 49

RQ2: Point-based vs Voxel-based vs Point-Voxel

2D Unet:

Kexin Shi & Yifei Liu – Robotics and Perception Group

39

40 of 49

RQ2: Point-based vs Voxel-based vs Point-Voxel

2D Unet Half:

Kexin Shi & Yifei Liu – Robotics and Perception Group

40

41 of 49

RQ2: Point-based vs Voxel-based vs Point-Voxel

3D Unet:

Kexin Shi & Yifei Liu – Robotics and Perception Group

41

42 of 49

RQ2: Performances of different methods?

EPE

AE

1PE

2PE

3PE

#Params

Speed(Instances/sec)

2D Unet

1.334

4.513

33.043

14.143

7.935

19.9M

23.4

3D Unet

1.326

4.676

31.888

13.556

7.648

10.5M

7.36

PV-Unet

1.296

4.570

30.270

12.996

7.530

10.5M

6.36

Regression:

  • NPE: 1-pixel-error, the percentage of ground truth pixels with optical flow magnitude error > N. N is either 1, 2 or 3.

  • EPE: Endpoint error. The average of the L2-Norm of the optical flow error.

  • AE: Angular error.

Kexin Shi & Yifei Liu – Robotics and Perception Group

42

43 of 49

RQ3: Is mlp part really useful?

Accuracy

#Params

Speed(Instances/sec)

No mlp

78.506

10.4M

25.36

Single Layer

77.816

10.5M

24.32

Two Layers

76.322

10.7M

22.12

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

43

44 of 49

RQ3: Is mlp part useful?

Regression:

EPE

AE

1PE

2PE

3PE

#Params

Speed(Instances/sec)

nomlp

1.296

4.570

30.270

12.996

7.530

10.5M

6.36

Single Layer

1.366

4.687

32.473

14.712

8.405

10.6M

5.73

Kexin Shi & Yifei Liu – Robotics and Perception Group

44

45 of 49

RQ4: Is devoxelization necessary?

Original

Remove Voxelization

Remove Voxelization and Devoxelization

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

45

46 of 49

RQ4: Is devoxelization necessary?

With devoxelization:

Without devoxelization:

Regression:

Kexin Shi & Yifei Liu – Robotics and Perception Group

46

47 of 49

RQ4: Is devoxelization necessary?

Accuracy

#Params

Speed

Original

76.897

9.1M

36.12

Remove Voxelization

69.885

9.1M

42.16

Remove both Voxelization and Devoxelization

70.115

9.1M

67

Classification:

Regression:

EPE

AE

1PE

2PE

3PE

#Params

Speed

w/ devoxelization

1.296

4.570

30.270

12.996

7.530

10.5M

6.36

w/o devoxelization

1.326

4.676

31.888

13.556

7.648

10.5M

7.36

Observation: Passing point-based features between different resolutions performs better than passing voxel-based features.

Kexin Shi & Yifei Liu – Robotics and Perception Group

47

48 of 49

RQ5: Dense vs Sparse

Classification:

Accuracy

#Params

Speed(Instances/sec)

Dense conv

78.506

10.4M

25.36

Sparse conv

56.122

9.7M

36.72

Regression:

EPE

AE

1PE

2PE

3PE

#Params

Speed(Instances/sec)

Dense conv

1.296

4.570

30.270

12.996

7.530

10.5M

6.36

Sparse conv

1.393

4.716

33.799

14.417

8.187

11.2M

15.6

Observation: By substituting the 3D convolution by sparse convolution, speed increases, but performance drops.

Kexin Shi & Yifei Liu – Robotics and Perception Group

48

49 of 49

Questions that could be potentially asked

  1. How does pointnet++ work? (how does it group points, how does it combine features between groups?)
  2. explain why pointnet++ is so slow when faced with so many points. (k-nearest-neighbor search)

Kexin Shi & Yifei Liu – Robotics and Perception Group

49