Efficient Spatio-Temporal Processing of Event Data
Kexin Shi, Yifei Liu�
Mathias Gehrig, Nico Messikommer, Davide Scaramuzza
Master Project�Robotics and Perception Group
Department of Informatics, University of Zurich
Institute of Informatics – Institute of Neuroinformatics
Kexin Shi & Yifei Liu – Robotics and Perception Group
1
Motivation
picture: https://github.com/TimoStoff/event_utils
Ways of processing events:
(a) Point-based
(b) Voxel-based
(c) Others: spiking neural networks, recurrent networks, etc.
(a) point-based
(b) voxel-based
(c) spiking neural network
Kexin Shi & Yifei Liu – Robotics and Perception Group
2
Related Work
a) Point-based:
EventNet[1]
b) Voxel-based:
Video to Events[2] → classification, segmentation
c) Other methods:
E2VID[3] → video reconstruction
E-RAFT[4] → optical flow
[1] Y. Sekikawa, IEEE/CVF 2019 [2] D. Gehrig, CVPR 2020 [3] H. Rebecq, IEEE PAMI 2019 [4] M. Gehrig, 3DV 2021
Kexin Shi & Yifei Liu – Robotics and Perception Group
3
Overview of Models
e.g. 2D CNN (2D Resnet, 2D Unet)
e.g. Volumetric CNN (3D ResNet, 3D Unet)
e.g. PointNet, PointNet++.
e.g. Point-Voxel CNN (PVCNN) [5]
[5]Liu, Zhijian, Haotian Tang, Yujun Lin and Song Han. “Point-Voxel CNN for Efficient 3D Deep Learning.” NeurIPS (2019).
Kexin Shi & Yifei Liu – Robotics and Perception Group
4
Point-Voxel CNN
Kexin Shi & Yifei Liu – Robotics and Perception Group
5
Task 1: Object Classification
Dataset: N-Caltech101.
Label: barrel
Label: anchor
https://www.garrickorchard.com/datasets/n-caltech101
Kexin Shi & Yifei Liu – Robotics and Perception Group
6
Task 2: Optical Flow Regression
Dataset: DSEC
Input
(events visualized as an image)
Optical Flow
https://dsec.ifi.uzh.ch/
Kexin Shi & Yifei Liu – Robotics and Perception Group
7
Research Questions
Kexin Shi & Yifei Liu – Robotics and Perception Group
8
RQ1: How to downsample effectively?
Kexin Shi & Yifei Liu – Robotics and Perception Group
9
Challenge: number of points
An anchor in N-Caltech101
200,000 events
A chair in ModelNet40
1,024 points
Kexin Shi & Yifei Liu – Robotics and Perception Group
10
RQ1: How to downsample effectively?
Kexin Shi & Yifei Liu – Robotics and Perception Group
11
RQ1: How to downsample effectively?
10 temporal bins,
threshold p>1
50 temporal bins,
threshold p>1
raw
random
10 temporal bins,
threshold p>2
50 temporal bins,
threshold p>2
Kexin Shi & Yifei Liu – Robotics and Perception Group
12
RQ1: How to downsample effectively?
Classification:
Regression:
Observation:
1) The less points, the faster speed, but also the lower performance.
2) With similar number of points, data-dependent downsample performs better than random sample.
Kexin Shi & Yifei Liu – Robotics and Perception Group
13
RQ2: Performances of different methods?
Kexin Shi & Yifei Liu – Robotics and Perception Group
14
RQ2: Performances of different methods?
RQ2: Performances of different methods?
Classification:
Regression:
Kexin Shi & Yifei Liu – Robotics and Perception Group
15
RQ2: Performances of different methods?
Classification:
Regression:
Kexin Shi & Yifei Liu – Robotics and Perception Group
16
RQ3: Is mlp part really useful?
Kexin Shi & Yifei Liu – Robotics and Perception Group
17
RQ3: Is mlp part really useful?
Classification:
Regression:
RQ3: Is mlp part useful?
Classification:
Regression:
Observation: Point-features generated by MLP is useless, even some negative effects. MLP structure can not extract useful information from events data.
Kexin Shi & Yifei Liu – Robotics and Perception Group
19
RQ3: Is mlp part useful?
Kexin Shi & Yifei Liu – Robotics and Perception Group
20
RQ4: Is devoxelization necessary?
Kexin Shi & Yifei Liu – Robotics and Perception Group
21
RQ4: Is devoxelization necessary?
Original
Remove Voxelization
Remove Voxelization and Devoxelization
Classification:
Kexin Shi & Yifei Liu – Robotics and Perception Group
22
RQ4: Is devoxelization necessary?
Regression:
W/o devoxelization
With devoxelization
Kexin Shi & Yifei Liu – Robotics and Perception Group
23
RQ4: Is devoxelization necessary?
Classification:
Regression:
Observation: Passing point-based features between different resolutions performs better than passing voxel-based features.
Kexin Shi & Yifei Liu – Robotics and Perception Group
24
RQ5: Dense vs Sparse?
Kexin Shi & Yifei Liu – Robotics and Perception Group
25
RQ5: Dense vs Sparse?
Kexin Shi & Yifei Liu – Robotics and Perception Group
26
RQ5: Dense vs Sparse?
Classification:
Regression:
Observation: By substituting the 3D convolution by sparse convolution, speed increases, but performance drops.
Kexin Shi & Yifei Liu – Robotics and Perception Group
27
Research Questions
Kexin Shi & Yifei Liu – Robotics and Perception Group
28
Conclusion
Outlook
1. Point-based models are not suitable for events
2. Point-Voxel: good performance, but slow
3. 3D voxel-based: either slower or worse performance
4. 2D voxel-based: a reasonable trade off between speed and performance
1. Finding a better way of downsample is necessary.
2. Use sparse convolution to increase speed.
Kexin Shi & Yifei Liu – Robotics and Perception Group
29
Take-Away Message
Kexin Shi & Yifei Liu – Robotics and Perception Group
30
Take-Away Message
Kexin Shi & Yifei Liu – Robotics and Perception Group
31
Thanks!
Kexin Shi & Yifei Liu – Robotics and Perception Group
32
Backup Slide
Kexin Shi & Yifei Liu – Robotics and Perception Group
33
RQ1: How to downsample effectively?
Classification:
| Accuracy | Mean #points | Speed(Instances/sec) |
Raw Data | - | 115,297 | - |
Sum 10, P>1 | 78.506 | 20,552 | 25.36 |
Random | 76.897 | 23,059 | 20.76 |
Sum 50, P>1 | 74.253 | 13,597 | 27.72 |
Sum 10, P>3 | 67.125 | 4,167 | 47.12 |
Kexin Shi & Yifei Liu – Robotics and Perception Group
34
RQ1: How to downsample effectively?
Regression:
| EPE | AE | 1PE | 2PE | 3PE | Mean #Points | Speed(Instances/sec) |
Raw Data | - | - | - | - | - | 1,984,865 | - |
Sum 10, P>1 | 1.296 | 4.570 | 30.270 | 12.996 | 7.530 | 157,941 | 6.36 |
Random | 1.562 | 4.973 | 40.546 | 18.404 | 10.442 | 198,486 | 5.36 |
Sum 10, P>3 | 1.895 | 7.049 | 50.891 | 24.02 | 13.788 | 23,545 | 7.44 |
Observation:
1) With similar number of points, data-dependent downsample performs better than random sample.
2) The less points, the faster speed, but also the lower performance.
Kexin Shi & Yifei Liu – Robotics and Perception Group
35
RQ2: Point vs. Voxel vs. Fuse Models
Classification: PVCNN
Kexin Shi & Yifei Liu – Robotics and Perception Group
36
RQ2: Point vs. Voxel vs. Fuse Models
| Accuracy | #Parameters | Speed(Instances/sec) |
PointNet++ | 51.954 | 1.8M | 9.4 |
3D ResNet18 | 74.943 | 33.2M | 58.16 |
2D ResNet34 | 76.092 | 21.3M | 58.2 |
PVCNN | 78.506 | 10.4M | 25.36 |
Observation:
Classification:
Kexin Shi & Yifei Liu – Robotics and Perception Group
37
RQ2: Point vs. Voxel vs. Fuse Models
Regression: PV-Unet
Kexin Shi & Yifei Liu – Robotics and Perception Group
38
RQ2: Point-based vs Voxel-based vs Point-Voxel
2D Unet:
Kexin Shi & Yifei Liu – Robotics and Perception Group
39
RQ2: Point-based vs Voxel-based vs Point-Voxel
2D Unet Half:
Kexin Shi & Yifei Liu – Robotics and Perception Group
40
RQ2: Point-based vs Voxel-based vs Point-Voxel
3D Unet:
Kexin Shi & Yifei Liu – Robotics and Perception Group
41
RQ2: Performances of different methods?
| EPE | AE | 1PE | 2PE | 3PE | #Params | Speed(Instances/sec) |
2D Unet | 1.334 | 4.513 | 33.043 | 14.143 | 7.935 | 19.9M | 23.4 |
3D Unet | 1.326 | 4.676 | 31.888 | 13.556 | 7.648 | 10.5M | 7.36 |
PV-Unet | 1.296 | 4.570 | 30.270 | 12.996 | 7.530 | 10.5M | 6.36 |
Regression:
Kexin Shi & Yifei Liu – Robotics and Perception Group
42
RQ3: Is mlp part really useful?
| Accuracy | #Params | Speed(Instances/sec) |
No mlp | 78.506 | 10.4M | 25.36 |
Single Layer | 77.816 | 10.5M | 24.32 |
Two Layers | 76.322 | 10.7M | 22.12 |
Classification:
Kexin Shi & Yifei Liu – Robotics and Perception Group
43
RQ3: Is mlp part useful?
Regression:
| EPE | AE | 1PE | 2PE | 3PE | #Params | Speed(Instances/sec) |
nomlp | 1.296 | 4.570 | 30.270 | 12.996 | 7.530 | 10.5M | 6.36 |
Single Layer | 1.366 | 4.687 | 32.473 | 14.712 | 8.405 | 10.6M | 5.73 |
Kexin Shi & Yifei Liu – Robotics and Perception Group
44
RQ4: Is devoxelization necessary?
Original
Remove Voxelization
Remove Voxelization and Devoxelization
Classification:
Kexin Shi & Yifei Liu – Robotics and Perception Group
45
RQ4: Is devoxelization necessary?
With devoxelization:
Without devoxelization:
Regression:
Kexin Shi & Yifei Liu – Robotics and Perception Group
46
RQ4: Is devoxelization necessary?
| Accuracy | #Params | Speed |
Original | 76.897 | 9.1M | 36.12 |
Remove Voxelization | 69.885 | 9.1M | 42.16 |
Remove both Voxelization and Devoxelization | 70.115 | 9.1M | 67 |
Classification:
Regression:
| EPE | AE | 1PE | 2PE | 3PE | #Params | Speed |
w/ devoxelization | 1.296 | 4.570 | 30.270 | 12.996 | 7.530 | 10.5M | 6.36 |
w/o devoxelization | 1.326 | 4.676 | 31.888 | 13.556 | 7.648 | 10.5M | 7.36 |
Observation: Passing point-based features between different resolutions performs better than passing voxel-based features.
Kexin Shi & Yifei Liu – Robotics and Perception Group
47
RQ5: Dense vs Sparse
Classification:
| Accuracy | #Params | Speed(Instances/sec) |
Dense conv | 78.506 | 10.4M | 25.36 |
Sparse conv | 56.122 | 9.7M | 36.72 |
Regression:
| EPE | AE | 1PE | 2PE | 3PE | #Params | Speed(Instances/sec) |
Dense conv | 1.296 | 4.570 | 30.270 | 12.996 | 7.530 | 10.5M | 6.36 |
Sparse conv | 1.393 | 4.716 | 33.799 | 14.417 | 8.187 | 11.2M | 15.6 |
Observation: By substituting the 3D convolution by sparse convolution, speed increases, but performance drops.
Kexin Shi & Yifei Liu – Robotics and Perception Group
48
Questions that could be potentially asked
Kexin Shi & Yifei Liu – Robotics and Perception Group
49