1 of 14

A Unified Framework for Compressive Video Recovery from Coded Exposure Techniques

Prasan Shedligeri1, Anupama S2, Kaushik Mitra1

1Department of EE, IIT Madras, Chennai, India

2Qualcomm India, Bangalore, India

2 of 14

Coded exposure techniques for high speed imaging

Spatio-temporal

signal, X

Exposure sequence,

Compressed measurement, Y

Reconstruction

Spatio-temporal signal, X’

Optical Encoding

Computational Decoding

3 of 14

Coded exposure schemes

Code,

Coded Image

Global exposure coding scheme

Flutter Shutter [Raskar et al. ‘06]

Code,

Coded Image

Pixel-wise coded exposure [Reddy et al. ‘11]

Code,

Complementary Code,

1-

Coded image

Complementary coded image

Coded-2-Bucket exposure [Sarhangnejad et al. ‘19]

4 of 14

A Unified Framework for Compressive Video Recovery

5 of 14

Unified framework for video reconstruction

UNet

Single coded-exposure image

Coded-blurred image pair

High-resolution video

OR

Input

Exposure aware feature extraction

Refinement stage

Output

SVC layer

Flutter Shutter

Pixel-wise

coded

OR

C2B

6 of 14

Exposure aware feature extraction for video reconstruction

Coded exposure based compression is a spatially local operation.

A pseudo-inverse solution shows us that the signal can be recovered by using spatially local operations as well.

This inspires us to use convolutional layers over fully connected networks.

7 of 14

Exposure aware feature extraction for video reconstruction

Convolutional weights should adapt to the different exposure pattern at each pixel.

Hence, we use the novel Shift Variant Convolutional layer [Okawara et al. ‘20]

8 of 14

Comparison across various coding techniques

9 of 14

Global exposure coding (Flutter Shutter)

Input coded image

Reconstructed video�Our method

Reconstructed video�GMM [Yang et al. ‘14]

PSNR 17.46 dB�SSIM 0.586

PSNR 21.82 dB�SSIM 0.773

Ground truth video

GMM

Ours

PSNR

SSIM

PSNR

SSIM

Average performance

21.45

0.697

21.61

0.710

10 of 14

Single Per-Pixel coding

Input

coded image

Ground truth video

Reconstructed

video GMM

[Yang et al. ‘14]

PSNR 30.15 dB�SSIM 0.930

Reconstructed

video DNN

[Yoshida et al. ‘18]

PSNR 30.91dB�SSIM 0.942

Reconstructed

video our method

PSNR 32.21dB�SSIM 0.954

GMM

DNN

Ours

PSNR

SSIM

PSNR

SSIM

PSNR

SSIM

Average performance

29.94

0.887

30.27

0.890

31.76

0.914

11 of 14

Coded 2 Bucket

Input

coded image

Reconstructed

video GMM

[Yang et al. ‘14]

PSNR 30.23 dB�SSIM 0.939

Reconstructed

video our method

PSNR 32.27dB�SSIM 0.961

Ground truth video

GMM

Ours

PSNR

SSIM

PSNR

SSIM

Average performance

30.84

0.898

32.34

0.920

12 of 14

Comparison of single per-pixel coding Vs. C2B

13 of 14

One vs two measurements

Reconstructed videos for a

purely dynamic scene

Reconstructed videos for a

significantly static scene

Reconstruction from

single measurement

Reconstruction from two measurements

Reconstruction from

single measurement

Reconstruction from two measurements

PSNR: 30.02 dB

SSIM: 0.907

PSNR: 35.45 dB

SSIM: 0.970

PSNR: 29.95 dB

SSIM: 0.904

PSNR: 30.38 dB

SSIM: 0.945

PSNR gain = 0.43 dB

PSNR gain = 5.43 dB

14 of 14

Summary

A unified learning based framework for video recovery from coded exposure measurements

We propose a non-iterative fully convolutional model to recover video signal from its compressed measurements.

We show that a fully convolutional architecture (with spatially varying weights) is more suitable than fully connected networks

Our approach matches or exceeds the state-of-the-art video reconstruction algorithms for each of the sensing techniques.

C2B has significant advantage over single coded capture for largely static scenes