3 of 84

History Of Photography

1827

1861

1964

First photo. View from the Window at Le Gras.

First color photo. Tartan ribbon.

One of the first videos. The Horse in Motion.

First digital image. Russell Kirsch's son.

Bullet through Apple.

Harold E. Edgerton

One trillions frames per second.

MIT Media Lab.

1887

1957

2011

4 of 84

Frame-based Video Cameras

Today

Time

Image 1

Image 2

Image 3

Shutter�open

Shutter�close

Collecting light

Smartphone camera

Lenses

Light sensor

Pixel array

Image formation

5 of 84

Frame-based Video Cameras: Drawbacks

Redundant sampling: e.g., a static scene is captured repeatedly.

High data rate (1-10Mb/s)

Low intra-scene dynamic range, cannot see bright and dark at the same time.

Over/under-exposure
Auto-exposure artifacts (e.g., sun causes image to darken)

Motion blur.

Time

6 of 84

Event Cameras: An Overview

Inspired by biological eyes.

Key properties:

Asynchronous

Independent pixels
Only report changes in brightness
Low bandwidth (0-1Mb/s)

Low latency (0.5ms)
High dynamic range (120dB)
Low power (0.1W)
No motion blur.

7 of 84

An Event Camera Pixel

A pixel (top) from an event camera mimics biological cells (bottom).

Incoming light hits a sensor, generating photocurrent signal.
The signal is amplified and compared to a reference level.
If the change is positive, an�ON event is triggered, if negative, an OFF event is triggered.
No change = no event.

Event Camera Pixel

Biological eye

Posch et al, PROC 2014

8 of 84

An Event Camera Pixel

Each event contains:

Timestamp (µs)	Pixel (x, y)	Polarity (ON, OFF)

DAVIS USB camera

Chip

Pixel

Lens

Input signal

Output events

Contrast threshold

9 of 84

DAVIS Event Camera Output

t	x	y	p
0.003432	13	35	0
0.003464	4	24	0
0.005203	213	2	1
0.005242	5	75	0
0.006072	64	9	1
0.010764	36	126	1
0.010798	98	4	0

Image frames�30fps

Events

10 of 84

What Do Events Look Like?

Mueggler et al, IJRR 2017

t	x	y	p
0.003432	13	35	0
0.003464	4	24	0
0.005203	213	2	1
0.005242	5	75	0
0.006072	64	9	1
0.010764	36	126	1
0.010798	98	4	0

Bardow et al, CVPR 2016

11 of 84

Comparison Of Image Sensors

Phantom v2640

Nikon D850

Human eye

Event camera

12 of 84

Comparison Of Image Sensors

	Ultrahigh-speed camera�(Phantom v2640)	High-end DSLR camera�(Nikon D850)	Human eye	Event camera
Equivalent framerate (fps)	12,500	120	50	100,000
Dynamic range (dB)	64	45	30-40	120
Power consumption (W)	280	8	0.01	0.1
Data rate (MB/s)	800	8	-	0 - 1
Output	Images	Images	Nerve impulses	Events

13 of 84

Part I: Continuous-time Vision With Event Cameras

14 of 84

Related Work

Brandli et al, propose adding events to a (log) DAVIS image frame to update the image.

Brandli et al, ISCAS 2014

Frame Frame + events

15 of 84

Related Work

Events are accumulated while the image is periodically regularised (smoothed).

Join optimization of image and optic flow over a batch of events.

Reinbacher et al, BMVC 2016

Bardow et al, CVPR 2016

16 of 84

Motivation

The DAVIS event camera outputs:

low frequency, low dynamic range, motion blurred image frames
high frequency, high dynamic range events.

Aim: Reconstruct super high speed, high dynamic range video with low latency.

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

17 of 84

Approach

Instead of a sequence of temporally sparse image frames, we propose to estimate a continuous-time image state.

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

Time

Frame 1

Frame-based camera

Event camera

Time

Image state

18 of 84

Approach

To be useful in practical applications e.g., real-time robotics, we would our method to be:

Computationally efficient
Low latency
Update on a per-event basis (no batching)

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

Crazyflie Nano

19 of 84

Mathematical Notation

Time

Integrator (equivalent to )

Events =

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

20 of 84

Naive Approach: Direct Integration

Problem: low temporal frequency noise accumulates, degrading the estimate over time.

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

21 of 84

Approach: High-pass Filter

High-pass filters attenuate (reduce) low frequency components of the signal while allowing high frequency components to pass.

High-pass filter

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

22 of 84

Approach: High-pass Filter

Problem: Low temporal frequency information is lost (static background).

Conventional camera

High-pass filtered events

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

23 of 84

Approach: Sensor Fusion

Can we fuse low-frequency information from frames with high-frequency information from events?

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

Conventional camera

High-pass filtered events

24 of 84

Approach: Complementary Filter

Low-pass

High-pass

Conventional Camera

Event Camera

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

25 of 84

Approach: Complementary Filter

Event Camera

All-pass*

Reconstruction

*axes plotted in log-scale

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

Low-pass

High-pass

Conventional Camera

26 of 84

Reminder: Mathematical Notation

Integrator (equivalent to )

Events =

Contrast threshold

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

27 of 84

Approach: Complementary Filter

Our complementary filter combines (temporally) low-pass filtered frames with high-pass filtered events.

Log intensity estimate

Low-pass filter

Log frames

High-pass filter

Events

Integrator

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

28 of 84

Approach: Complementary Filter

The continuous-time ODE and solution can be obtained analytically.

Frequency domain

Time domain

Inverse Laplace transform

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

29 of 84

Approach: Complementary Filter

We solve the ODE in two regimes: between events, and at events, for every pixel.

If E = 0, i.e. between two events

If E ≠ 0, i.e. an event occurs (dirac-delta)

Time

Update whenever an event is received

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

Computationally efficient

20M events / second i7 CPU

30 of 84

Approach: Complementary Filter

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

31 of 84

Results

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

32 of 84

Results

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

33 of 84

Contributions

Continuous-time formulation of complementary filtering for image reconstruction with event cameras.
Asynchronous, per-event update scheme.
Real-time implementation that outperformed state-of-the-art at the time.
Fastest event camera image reconstruction algorithm to date.
Open source code - github.com/cedric-scheerlinck/dvs_image_reconstruction

C. Scheerlinck, N. Barnes, R. Mahony, "Continuous-time Intensity Estimation Using Event Cameras", ACCV, 2018

34 of 84

Asynchronous Convolutions

Motivation: Spatial convolution is a fundamental image operator used for gradient computation, convolutional neural networks and much more, yet there is no natural spatial convolution operator for event cameras.

Idea: extend the continuous-time image framework to spatial image convolutions for event cameras.

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

35 of 84

Basics: Image Convolutions

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

36 of 84

Approach: Asynchronous Convolutions

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

37 of 84

Approach: Asynchronous Convolutions

Consider one event

[timestamp, x, y, ±1]

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

38 of 84

Approach: Asynchronous Convolutions

0	0	0	0	0	0
0	0	0	0	0	0
0	0	-1	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0

Event image

Consider one event

[timestamp, x, y, ±1]

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

39 of 84

Approach: Asynchronous Convolutions

Kernel

0	0	0	0	0	0
0	0	0	0	0	0
0	0	-1	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0

Event image

Consider one event

[timestamp, x, y, ±1]

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

40 of 84

Approach: Asynchronous Convolutions

Event image

Kernel

0	0	0	0	0	0
0	1	0	-1	0	0
0	2	0	-2	0	0
0	1	0	-1	0	0
0	0	0	0	0	0
0	0	0	0	0	0

Consider one event

[timestamp, x, y, ±1]

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

41 of 84

Approach: Asynchronous Convolutions

Event image

Kernel

0	0	0	0	0	0
0	1	0	-1	0	0
0	2	0	-2	0	0
0	1	0	-1	0	0
0	0	0	0	0	0
0	0	0	0	0	0

Six virtual events, or a convolved event, can be generated

Time

OFF

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

42 of 84

Approach: Asynchronous Convolutions

Convolved events can be used as input to an event processing algorithm, e.g., complementary filter:

Events

Image

estimate

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

43 of 84

Approach: Asynchronous Convolutions

Gradient

estimate

Events

Convolved events

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

44 of 84

Results

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

45 of 84

Results: Gradient

Events Gradient Poisson integration

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

46 of 84

Results: Corner Detection

Gradient can be used as input to a corner detection algorithm.

When an event arrives, the Harris response is only updated in a local neighbourhood.

Gradient Harris response

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

47 of 84

Results: Corner Detection

Gradient Harris response Corners Frame-based corners

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

48 of 84

Results: Corner Detection

Eharris (Vasco’16) FAST (Mueggler’17) ARC (Ignacio’18) Ours Frame Harris

Local non-maximum suppression can be applied to our continuous-time Harris response state to get clean corners.

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

49 of 84

Part II: Convolutional Neural Networks For Event Cameras

50 of 84

Motivation

Convolutional neural networks (CNNs) are a powerful image processing tool that yield state-of-the-art results in a range of computer vision topics from optic flow, classification, segmentation and more.

Can CNNs be used with event cameras, e.g., to reconstruct high quality images?

51 of 84

Related Works

Events are not naturally suited to convolutional neural networks (CNNs).

Converting them to 3D space-time voxel grids yields state-of-the-art results in image reconstruction, optic flow and classification.

Gehrig et al, ICCV 2019

Time

Voxel grid 1 2 3 4

Events

52 of 84

Related Works

Rebecq et al. achieve state-of-the-art video reconstruction using a recurrent variant of UNet, trained with simulated data.

TPAMI’20

CVPR’19

53 of 84

Related Works

Synthetic training data for E2VID.

CoRL 2018

55 of 84

Limitations Of E2VID

Computational cost
Doesn’t generalize to MVSEC dataset
Fades rapidly when event rate drops

Compute time per image, Titan Xp GPU

56 of 84

Limitations Of E2VID

Computational cost
Doesn’t generalize to MVSEC dataset
Fades rapidly when event rate drops

Compute time per image, Titan Xp GPU

57 of 84

Fast Image Reconstruction With An Event Camera

Aim: achieve similar image quality as E2VID while drastically improving computational efficiency.

Idea: Starting from E2VID, remove components one-by-one while maintaining prediction accuracy.

C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, D. Scaramuzza, “Fast Image Reconstruction with an Event Camera”, Winter Conference on Applications of Computer Vision (WACV), 2020.

58 of 84

Fast Image Reconstruction With An Event Camera

Result: our network runs 3-4x faster than E2VID, requires 10x less FLOPs, and has a size reduction of 99.6%.

C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, D. Scaramuzza, “Fast Image Reconstruction with an Event Camera”, Winter Conference on Applications of Computer Vision (WACV), 2020.

What about accuracy?

59 of 84

Fast Image Reconstruction With An Event Camera

Result: our network achieves similar accuracy to E2VID on the IJRR’17 dataset.

C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, D. Scaramuzza, “Fast Image Reconstruction with an Event Camera”, Winter Conference on Applications of Computer Vision (WACV), 2020.

60 of 84

Recurrent Unit Ablation

C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, D. Scaramuzza, “Fast Image Reconstruction with an Event Camera”, Winter Conference on Applications of Computer Vision (WACV), 2020.

61 of 84

Limitations

FireNet is slower to initialise (left) and has more smearing on fast motions (right).

C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, D. Scaramuzza, “Fast Image Reconstruction with an Event Camera”, Winter Conference on Applications of Computer Vision (WACV), 2020.

62 of 84

Fast Image Reconstruction With An Event Camera

C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, D. Scaramuzza, “Fast Image Reconstruction with an Event Camera”, Winter Conference on Applications of Computer Vision (WACV), 2020.

63 of 84

Work In Progress (Unpublished)

Aim: determine important factors for training image reconstruction and optic flow networks, such as:

simulation parameters
training parameters
data augmentation
loss functions
network architecture,

to outperform state-of-the-art and guide future research in this direction.

64 of 84

Limitations Of E2VID

Computational cost
Doesn’t generalize to MVSEC dataset
Fades rapidly when event rate drops

Compute time per image, Titan Xp GPU

65 of 84

Sequence Length

To train a recurrent neural network a (temporal) sequence of data is forward-passed to the network and the loss at each step computed.

A single backpropagation step is performed at the end, updating the network weights based on the gradient of the accumulated loss with respect to the weights.

Step 1

Loss 1

Step 2

Loss 2

Step 3

Loss 3

Step 4

Loss 4

Loss 1 +

Loss 2 +

Loss 3 +

Loss 4

Loss

Network update (backprop)

Time

Events

66 of 84

Sequence Length

A shorter sequence length requires fewer computational steps (forward-passes) per update (backprop), thus may train faster.

A longer sequence may endow the network with a longer temporal ‘memory’.

Step 1

Loss 1

Step 2

Loss 2

Step 3

Loss 3

Step 4

Loss 4

Loss 1 +

Loss 2 +

Loss 3 +

Loss 4

Loss

Network update (backprop)

Events

Time

67 of 84

E2VID Ours

Training data:

- sequence length = 40 steps

- medium-fast camera motions

- clean (no noise)

- planar, static scenes

Training data:

- sequence length = 135 steps

- slow-fast camera motions

- noise added

- planar, static scenes

68 of 84

E2VID Vs. Ours: Fading

E2VID fades rapidly while ours maintains temporal persistence.

Conclusion: long temporal sequences are key to improving temporal memory.

E2VID Ours

69 of 84

Limitations Of E2VID

Computational cost
Doesn’t generalize to MVSEC dataset
Fades rapidly when event rate drops

Compute time per image, Titan Xp GPU

70 of 84

Contrast Thresholds

How do you measure the contrast thresholds (CT) of an event camera?

Heuristic: measure the rate of events/(pixel*second).

High CT will produce less events, low CT will produce more.

Synthetic data

Real data

71 of 84

E2VID Ours

Training data:

- Contrast thresholds drawn from Gaussian distribution with mean=0.18, std=0.03

Training data:

- Contrast thresholds range from 0.2 - 1.0

72 of 84

E2VID Vs. Ours: MVSEC Dataset

E2VID simply breaks on event data from the MVSEC dataset, while ours produces a reasonable video.

Conclusion: wide range of contrast thresholds in training data improves generalizability to other datasets.

E2VID Ours

73 of 84

Image + Flow Network

We trained a combined network to output image and flow simultaneously.

Same size network - computationally efficient.

But so far performance of combined network is worse than an image-only or flow-only network.

74 of 84

Our Event Convolutional Neural Network

Outperforms state-of-the-art (E2VID) by 15-30% on major event camera datasets.

	E2VID	Ours
Contrast thresholds	Guassian; mean=0.18, std=0.03	Range from 0.2-1.0
Motion	Medium-fast, planar, static scenes	Slow-fast, multiple 2D objects flying across moving background
Noise	Clean	Noise added dynamically at train time
Loss	LPIPS: VGG pretrained weights	LPIPS: AlexNet pretrained weights
Sequence length	40 images	120 images
Optic flow	No	Yes

75 of 84

Conclusion

Event cameras are:

Pros: fast, high dynamic range, low power sensors.

Cons: noisy, difficult to process using conventional computer vision (no images).

Part I: complementary filtering can be used for computationally efficient real-time image reconstruction and convolution.

Part II: CNNs are currently state-of-the-art in image reconstruction, optic flow (and more) for event cameras. Training data including a range of contrast thresholds, motion types, noise, and long temporal sequences are key to improving results.

76 of 84

An Event Camera Pixel

Key difference between frame-based camera and event camera is pixel circuitry.

DAVIS USB camera

Chip

Pixel

Lens

Gallego et al, arXiv 2019

77 of 84

Results

C. Scheerlinck, N. Barnes, R. Mahony, “Asynchronous Spatial Image Convolutions for Event Cameras”, IEEE Robotics and Automation Letters (RAL), 2019.

78 of 84

Related Works

Zhu et al. show depth, ego-motion and state-of-the-art optic flow using a convolutional UNet architecture, trained on 11 minutes of driving data from their MVSEC dataset.

They were the first to propose voxel-grid representation for events.

Voxel�grid

RSS 2018

CVPR 2019

79 of 84

Optic flow

Events prediction groundtruth

Depth

Events prediction groundtruth

Zhu et al, CVPR 2019

81 of 84

LPIPS

CVPR 2018

82 of 84

LPIPS distance

“VGG” ICLR 2015 (30k citations)

83 of 84

ECCV 2018

84 of 84

Training E2VID

Voxel

UNet

LPIPS

ESIM

Differentiable

Time

1 2 3 4

Events

Groundtruth image

1 of 84

2 of 84

3 of 84

4 of 84

5 of 84

6 of 84

7 of 84

8 of 84

9 of 84

10 of 84

11 of 84

12 of 84

13 of 84

14 of 84

15 of 84

16 of 84

17 of 84

18 of 84

19 of 84

20 of 84

21 of 84

22 of 84

23 of 84

24 of 84

25 of 84

26 of 84

27 of 84

28 of 84

29 of 84

30 of 84

31 of 84

32 of 84

33 of 84

34 of 84

35 of 84

36 of 84

37 of 84

38 of 84

39 of 84

40 of 84

41 of 84

42 of 84

43 of 84

44 of 84

45 of 84

46 of 84

47 of 84

48 of 84

49 of 84

50 of 84

51 of 84

52 of 84

53 of 84

54 of 84

55 of 84

56 of 84

57 of 84

58 of 84

59 of 84

60 of 84

61 of 84

62 of 84

63 of 84

64 of 84

65 of 84

66 of 84

67 of 84

68 of 84

69 of 84

70 of 84

71 of 84

72 of 84

73 of 84

74 of 84

75 of 84

76 of 84

77 of 84

78 of 84

79 of 84

80 of 84