1 of 49

Efficient Spatio-Temporal Processing of Event Data

Kexin Shi, Yifei Liu�

Mathias Gehrig, Nico Messikommer, Davide Scaramuzza

Master Project�Robotics and Perception Group

Department of Informatics, University of Zurich

Institute of Informatics – Institute of Neuroinformatics

Kexin Shi & Yifei Liu – Robotics and Perception Group

1

2 of 49

Motivation

picture: https://github.com/TimoStoff/event_utils

Ways of processing events:

(a) Point-based

(b) Voxel-based

(c) Others: spiking neural networks, recurrent networks, etc.

(a) point-based

(b) voxel-based

(c) spiking neural network

Kexin Shi & Yifei Liu – Robotics and Perception Group

2

3 of 49

Related Work

a) Point-based:

EventNet[1]

b) Voxel-based:

Video to Events[2] → classification, segmentation

c) Other methods:

E2VID[3] → video reconstruction

E-RAFT[4] → optical flow

[1] Y. Sekikawa, IEEE/CVF 2019 [2] D. Gehrig, CVPR 2020 [3] H. Rebecq, IEEE PAMI 2019 [4] M. Gehrig, 3DV 2021

Kexin Shi & Yifei Liu – Robotics and Perception Group

3

4 of 49

Overview of Models

Voxel-based 2D models.

e.g. 2D CNN (2D Resnet, 2D Unet)

Voxel-based 3D models.

e.g. Volumetric CNN (3D ResNet, 3D Unet)

Point-based 3D models.

e.g. PointNet, PointNet++.

Mixing voxel features and point features.

e.g. Point-Voxel CNN (PVCNN) [5]

[5]Liu, Zhijian, Haotian Tang, Yujun Lin and Song Han. “Point-Voxel CNN for Efficient 3D Deep Learning.” NeurIPS (2019).

Kexin Shi & Yifei Liu – Robotics and Perception Group

4

5 of 49

Point-Voxel CNN

Kexin Shi & Yifei Liu – Robotics and Perception Group

5

6 of 49

Task 1: Object Classification

Dataset: N-Caltech101.

Label: barrel

Label: anchor

https://www.garrickorchard.com/datasets/n-caltech101

Kexin Shi & Yifei Liu – Robotics and Perception Group

6

7 of 49

Task 2: Optical Flow Regression

Dataset: DSEC

Input

(events visualized as an image)

Optical Flow

https://dsec.ifi.uzh.ch/

Kexin Shi & Yifei Liu – Robotics and Perception Group

7

8 of 49

Research Questions

Kexin Shi & Yifei Liu – Robotics and Perception Group

8

9 of 49

RQ1: How to downsample effectively?

Kexin Shi & Yifei Liu – Robotics and Perception Group

9

10 of 49

Challenge: number of points

An anchor in N-Caltech101

200,000 events

A chair in ModelNet40

1,024 points

Kexin Shi & Yifei Liu – Robotics and Perception Group

10

11 of 49

RQ1: How to downsample effectively?

Kexin Shi & Yifei Liu – Robotics and Perception Group

11

12 of 49

RQ1: How to downsample effectively?

10 temporal bins,

threshold p>1

50 temporal bins,

threshold p>1

raw

random

10 temporal bins,

threshold p>2

50 temporal bins,

threshold p>2

Kexin Shi & Yifei Liu – Robotics and Perception Group

12

13 of 49

RQ1: How to downsample effectively?

Classification:

Regression:

Observation:

1) The less points, the faster speed, but also the lower performance.

2) With similar number of points, data-dependent downsample performs better than random sample.

Kexin Shi & Yifei Liu – Robotics and Perception Group

13

14 of 49

RQ2: Performances of different methods?

Kexin Shi & Yifei Liu – Robotics and Perception Group

14

15 of 49

RQ2: Performances of different methods?

Classification:

Regression:

Kexin Shi & Yifei Liu – Robotics and Perception Group

15

16 of 49

RQ2: Performances of different methods?

Classification:

Regression:

Kexin Shi & Yifei Liu – Robotics and Perception Group

16

17 of 49

RQ3: Is mlp part really useful?

Kexin Shi & Yifei Liu – Robotics and Perception Group

17

18 of 49

RQ3: Is mlp part really useful?

Classification:

Regression:

19 of 49

RQ3: Is mlp part useful?

Classification:

Regression:

Observation: Point-features generated by MLP is useless, even some negative effects. MLP structure can not extract useful information from events data.

Kexin Shi & Yifei Liu – Robotics and Perception Group

19

20 of 49

RQ3: Is mlp part useful?

Kexin Shi & Yifei Liu – Robotics and Perception Group

20

21 of 49

RQ4: Is devoxelization necessary?

Kexin Shi & Yifei Liu – Robotics and Perception Group

21

22 of 49

RQ4: Is devoxelization necessary?

Original

Remove Voxelization

Remove Voxelization and Devoxelization

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

22

23 of 49

RQ4: Is devoxelization necessary?

Regression:

W/o devoxelization

With devoxelization

Kexin Shi & Yifei Liu – Robotics and Perception Group

23

24 of 49

RQ4: Is devoxelization necessary?

Classification:

Regression:

Observation: Passing point-based features between different resolutions performs better than passing voxel-based features.

Kexin Shi & Yifei Liu – Robotics and Perception Group

24

25 of 49

RQ5: Dense vs Sparse?

Kexin Shi & Yifei Liu – Robotics and Perception Group

25

26 of 49

RQ5: Dense vs Sparse?

Kexin Shi & Yifei Liu – Robotics and Perception Group

26

27 of 49

RQ5: Dense vs Sparse?

Classification:

Regression:

Observation: By substituting the 3D convolution by sparse convolution, speed increases, but performance drops.

Kexin Shi & Yifei Liu – Robotics and Perception Group

27

28 of 49

Research Questions

Kexin Shi & Yifei Liu – Robotics and Perception Group

28

29 of 49

Conclusion

Outlook

1. Point-based models are not suitable for events

2. Point-Voxel: good performance, but slow

3. 3D voxel-based: either slower or worse performance

4. 2D voxel-based: a reasonable trade off between speed and performance

1. Finding a better way of downsample is necessary.

2. Use sparse convolution to increase speed.

Kexin Shi & Yifei Liu – Robotics and Perception Group

29

30 of 49

Take-Away Message

RQ1: How to downsample effectively?

The less points, the faster speed, but also the lower performance.
With similar number of points, data-dependent downsample performs better than random sample.

RQ2: What’s the performances of different methods?

Point-based method is inaccurate and slow.
Point-Voxel method has the best performance with sacrificing some speed.
Voxel-based method (3D method) is either bad at performance or speed. The most time-consuming part is 3D convolution.
Frame-based method (2D method) has a good trade-off between performance and speed.

Kexin Shi & Yifei Liu – Robotics and Perception Group

30

31 of 49

Take-Away Message

RQ3: Is MLP part useful?

No. MLP can only bring noises even we increase the number of layers.
Point features generated by MLP are not useful for events data.

RQ4: Is devoxelization necessary?

Yes. Compared with passing voxel-based features between different resolutions, passing point-based features has a distinct improvement in performance.

RQ5: Dense convolution or Sparse convolution?

Sparse convolution is faster, but with worse performance (within the pvcnn structure).

Kexin Shi & Yifei Liu – Robotics and Perception Group

31

32 of 49

Thanks!

Code:

https://github.com/uzh-rpg/master_project_kexin_shi_yifei_liu

Kexin Shi & Yifei Liu – Robotics and Perception Group

32

33 of 49

Backup Slide

Kexin Shi & Yifei Liu – Robotics and Perception Group

33

34 of 49

RQ1: How to downsample effectively?

Classification:

	Accuracy	Mean #points	Speed(Instances/sec)
Raw Data	-	115,297	-
Sum 10, P>1	78.506	20,552	25.36
Random	76.897	23,059	20.76
Sum 50, P>1	74.253	13,597	27.72
Sum 10, P>3	67.125	4,167	47.12

Kexin Shi & Yifei Liu – Robotics and Perception Group

34

35 of 49

RQ1: How to downsample effectively?

Regression:

	EPE	AE	1PE	2PE	3PE	Mean #Points	Speed(Instances/sec)
Raw Data	-	-	-	-	-	1,984,865	-
Sum 10, P>1	1.296	4.570	30.270	12.996	7.530	157,941	6.36
Random	1.562	4.973	40.546	18.404	10.442	198,486	5.36
Sum 10, P>3	1.895	7.049	50.891	24.02	13.788	23,545	7.44

Observation:

1) With similar number of points, data-dependent downsample performs better than random sample.

2) The less points, the faster speed, but also the lower performance.

Kexin Shi & Yifei Liu – Robotics and Perception Group

35

36 of 49

RQ2: Point vs. Voxel vs. Fuse Models

Classification: PVCNN

Kexin Shi & Yifei Liu – Robotics and Perception Group

36

37 of 49

RQ2: Point vs. Voxel vs. Fuse Models

	Accuracy	#Parameters	Speed(Instances/sec)
PointNet++	51.954	1.8M	9.4
3D ResNet18	74.943	33.2M	58.16
2D ResNet34	76.092	21.3M	58.2
PVCNN	78.506	10.4M	25.36

Observation:

Pure point-based method is slow and inaccurate.
Point-Voxel method gets the best performance in all un-pretrained models.

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

37

38 of 49

RQ2: Point vs. Voxel vs. Fuse Models

Regression: PV-Unet

Kexin Shi & Yifei Liu – Robotics and Perception Group

38

39 of 49

RQ2: Point-based vs Voxel-based vs Point-Voxel

2D Unet:

Kexin Shi & Yifei Liu – Robotics and Perception Group

39

40 of 49

RQ2: Point-based vs Voxel-based vs Point-Voxel

2D Unet Half:

Kexin Shi & Yifei Liu – Robotics and Perception Group

40

41 of 49

RQ2: Point-based vs Voxel-based vs Point-Voxel

3D Unet:

Kexin Shi & Yifei Liu – Robotics and Perception Group

41

42 of 49

RQ2: Performances of different methods?

	EPE	AE	1PE	2PE	3PE	#Params	Speed(Instances/sec)
2D Unet	1.334	4.513	33.043	14.143	7.935	19.9M	23.4
3D Unet	1.326	4.676	31.888	13.556	7.648	10.5M	7.36
PV-Unet	1.296	4.570	30.270	12.996	7.530	10.5M	6.36

Regression:

NPE: 1-pixel-error, the percentage of ground truth pixels with optical flow magnitude error > N. N is either 1, 2 or 3.

EPE: Endpoint error. The average of the L2-Norm of the optical flow error.

AE: Angular error.

Kexin Shi & Yifei Liu – Robotics and Perception Group

42

43 of 49

RQ3: Is mlp part really useful?

	Accuracy	#Params	Speed(Instances/sec)
No mlp	78.506	10.4M	25.36
Single Layer	77.816	10.5M	24.32
Two Layers	76.322	10.7M	22.12

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

43

44 of 49

RQ3: Is mlp part useful?

Regression:

	EPE	AE	1PE	2PE	3PE	#Params	Speed(Instances/sec)
nomlp	1.296	4.570	30.270	12.996	7.530	10.5M	6.36
Single Layer	1.366	4.687	32.473	14.712	8.405	10.6M	5.73

Kexin Shi & Yifei Liu – Robotics and Perception Group

44

45 of 49

RQ4: Is devoxelization necessary?

Original

Remove Voxelization

Remove Voxelization and Devoxelization

Classification:

Kexin Shi & Yifei Liu – Robotics and Perception Group

45

46 of 49

RQ4: Is devoxelization necessary?

With devoxelization:

Without devoxelization:

Regression:

Kexin Shi & Yifei Liu – Robotics and Perception Group

46

47 of 49

RQ4: Is devoxelization necessary?

	Accuracy	#Params	Speed
Original	76.897	9.1M	36.12
Remove Voxelization	69.885	9.1M	42.16
Remove both Voxelization and Devoxelization	70.115	9.1M	67

Classification:

Regression:

	EPE	AE	1PE	2PE	3PE	#Params	Speed
w/ devoxelization	1.296	4.570	30.270	12.996	7.530	10.5M	6.36
w/o devoxelization	1.326	4.676	31.888	13.556	7.648	10.5M	7.36

Observation: Passing point-based features between different resolutions performs better than passing voxel-based features.

Kexin Shi & Yifei Liu – Robotics and Perception Group

47

48 of 49

RQ5: Dense vs Sparse

Classification:

	Accuracy	#Params	Speed(Instances/sec)
Dense conv	78.506	10.4M	25.36
Sparse conv	56.122	9.7M	36.72

Regression:

	EPE	AE	1PE	2PE	3PE	#Params	Speed(Instances/sec)
Dense conv	1.296	4.570	30.270	12.996	7.530	10.5M	6.36
Sparse conv	1.393	4.716	33.799	14.417	8.187	11.2M	15.6

Observation: By substituting the 3D convolution by sparse convolution, speed increases, but performance drops.

Kexin Shi & Yifei Liu – Robotics and Perception Group

48

49 of 49

Questions that could be potentially asked

How does pointnet++ work? (how does it group points, how does it combine features between groups?)
explain why pointnet++ is so slow when faced with so many points. (k-nearest-neighbor search)

Kexin Shi & Yifei Liu – Robotics and Perception Group

49