A Unified Framework for Compressive Video Recovery from Coded Exposure Techniques
Prasan Shedligeri1, Anupama S2, Kaushik Mitra1
1Department of EE, IIT Madras, Chennai, India
2Qualcomm India, Bangalore, India
Coded exposure techniques for high speed imaging
Spatio-temporal
signal, X
Exposure sequence,
Compressed measurement, Y
Reconstruction
Spatio-temporal signal, X’
Optical Encoding
Computational Decoding
Coded exposure schemes
Code,
Coded Image
Global exposure coding scheme
Flutter Shutter [Raskar et al. ‘06]
Code,
Coded Image
Pixel-wise coded exposure [Reddy et al. ‘11]
Code,
Complementary Code,
1-
Coded image
Complementary coded image
Coded-2-Bucket exposure [Sarhangnejad et al. ‘19]
A Unified Framework for Compressive Video Recovery
Unified framework for video reconstruction
UNet
Single coded-exposure image
Coded-blurred image pair
High-resolution video
OR
Input
Exposure aware feature extraction
Refinement stage
Output
SVC layer
Flutter Shutter
Pixel-wise
coded
OR
C2B
Exposure aware feature extraction for video reconstruction
Coded exposure based compression is a spatially local operation.
A pseudo-inverse solution shows us that the signal can be recovered by using spatially local operations as well.
This inspires us to use convolutional layers over fully connected networks.
Exposure aware feature extraction for video reconstruction
Convolutional weights should adapt to the different exposure pattern at each pixel.
Hence, we use the novel Shift Variant Convolutional layer [Okawara et al. ‘20]
Comparison across various coding techniques
Global exposure coding (Flutter Shutter)
Input coded image
Reconstructed video�Our method
Reconstructed video�GMM [Yang et al. ‘14]
PSNR 17.46 dB�SSIM 0.586
PSNR 21.82 dB�SSIM 0.773
Ground truth video
| GMM | Ours | ||
| PSNR | SSIM | PSNR | SSIM |
Average performance | 21.45 | 0.697 | 21.61 | 0.710 |
Single Per-Pixel coding
Input
coded image
Ground truth video
Reconstructed
video GMM
[Yang et al. ‘14]
PSNR 30.15 dB�SSIM 0.930
Reconstructed
video DNN
[Yoshida et al. ‘18]
PSNR 30.91dB�SSIM 0.942
Reconstructed
video our method
PSNR 32.21dB�SSIM 0.954
| GMM | DNN | Ours | |||
| PSNR | SSIM | PSNR | SSIM | PSNR | SSIM |
Average performance | 29.94 | 0.887 | 30.27 | 0.890 | 31.76 | 0.914 |
Coded 2 Bucket
Input
coded image
Reconstructed
video GMM
[Yang et al. ‘14]
PSNR 30.23 dB�SSIM 0.939
Reconstructed
video our method
PSNR 32.27dB�SSIM 0.961
Ground truth video
| GMM | Ours | ||
| PSNR | SSIM | PSNR | SSIM |
Average performance | 30.84 | 0.898 | 32.34 | 0.920 |
Comparison of single per-pixel coding Vs. C2B
One vs two measurements
Reconstructed videos for a
purely dynamic scene
Reconstructed videos for a
significantly static scene
Reconstruction from
single measurement
Reconstruction from two measurements
Reconstruction from
single measurement
Reconstruction from two measurements
PSNR: 30.02 dB
SSIM: 0.907
PSNR: 35.45 dB
SSIM: 0.970
PSNR: 29.95 dB
SSIM: 0.904
PSNR: 30.38 dB
SSIM: 0.945
PSNR gain = 0.43 dB
PSNR gain = 5.43 dB
Summary
A unified learning based framework for video recovery from coded exposure measurements
We propose a non-iterative fully convolutional model to recover video signal from its compressed measurements.
We show that a fully convolutional architecture (with spatially varying weights) is more suitable than fully connected networks
Our approach matches or exceeds the state-of-the-art video reconstruction algorithms for each of the sensing techniques.
C2B has significant advantage over single coded capture for largely static scenes