Compressive Sensing
CS 754
Ajit Rajwade
1
Outline of the Lectures
2
Introduction + MOTIVATION
3
The Era of Big Data
4
https://en.wikipedia.org/wiki/Big_data
CC image courtesy of Ender005 on WikiCommons licensed under CC-BY-SA-4.0
The Era of Big Data
5
Conventional Sensing and Compression
6
Converting to transform domain: DFT
7
Fourier Transform
8
Fourier coefficient
(complex) sinusoidal bases
Value of signal f at location n (f is a vector of size N)
u-th Fourier coefficient; u = frequency index
Vector of N Fourier
coefficients
N x N orthonormal matrix (Fourier basis matrix)
u-th column of H (corresponding to Fourier coefficient F(u))
i=sqrt(-1)
Fourier Basis Matrix
9
Time index n is constant over all entries in a given row. Frequency index u is constant across all entries in a given column. Thus each column has a fixed frequency (of the complex exponential).
2D Fourier Transform
10
F and f are vectors of length MN
Matrix of size MN x MN, whose columns are indexed by a frequency pair u,v (note Huv)
Vector of size MN x 1. Note that though F and f are 2D signals, they are represented here in vectorized form.
Discrete Cosine Transform (DCT) in 1D
11
Note that by convention transpose here refers to the conjugate transpose (this is also a convention followed by MATLAB).
Discrete Cosine Transform (DCT) in 1D
12
Across any column of H, the frequency index u is constant and time (or spatial) index n varies. Across any row of H, the frequency index varies and n is constant.
DCT
13
14
DCT in 2D
15
The DCT matrix is this case will have size MN x MN, and it will be the Kronecker product of two DCT matrices – one of size M x M, the other of size N x N. The DCT matrix for the 2D case is also orthonormal, it is NOT symmetric and it is NOT the real part of the 2D DFT.
Do not interpret Hnmuv as a 4-D index, rather it is an element of the column of H indexed by frequency (u,v). Which element? An element indexed by spatial coordinates (n,m).
How do the DCT bases look like? (1D case)
16
How do the DCT bases look like? (2D-case)
17
The DCT transforms an 8×8 block of input values to a linear combination of these 64 patterns. The patterns are referred to as the two-dimensional DCT basis functions, and the output values are referred to as transform coefficients.
Each image here is obtained from the 8 x 8 outer product of a pair of DCT basis vectors. Each image is stretched between 0 and 255 – on a common scale.
Again: DCT is NOT the real part of the DFT
18
Real part of DFT
DCT
What is Compressive Sensing?
19
What is Compressive Sensing?
20
Why compressive sensing?
21
Shannon’s sampling theorem
22
23
Actual Analog Signal
Digital signal (sampled from analog signal)
Samples
Reconstructed signal (reconstruction is accurate if Nyquist’s condition is satisfied)
Band-limited signal
24
Aliasing
25
Aliasing when the sampling rate less than 2B
Original signal
26
Aliasing when the sampling rate is equal to 2B
No aliasing when the sampling rate is more than 2B
Whittaker-Shannon Interpolation Formula
27
Magnitude of the n-th sample
Sampling period =1/Sampling rate
28
The sinc function (it is the Fourier transform of the rect function)
Some limitations of Shannon’s theorem
29
Motivation for CS: Candes’ puzzling experiment (Circa 2004)
30
Ref: Candes, Romberg and Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information”, IEEE Transactions on Information Theory, Feb 2006.
Motivation for CS: Candes’ puzzling experiment (Circa 2004)
31
THEORY OF COMPRESSIVE SENSING
32
CS: Theory
33
Measuring devices can be represented as linear systems
34
CS: Theory
35
CS: Theory
36
CS: Theory
37
CS: Theory
38
Signal Sparsity
39
Actually, the representation is compressible, i.e. most of the coefficients are close to zero, but not exactly zero. We will clarify this issue very soon.
This is the L0 norm of a vector = the number of non-zero elements in it
Signal Sparsity
40
The barbara image (left) of size 512 x 512 and its reconstruction from just the top 100,000 (out of 262144) DCT coefficients. The difference is indistinguishable and similar results hold for wavelet transforms as well
Incoherent Sampling
41
Incoherent Sampling
42
‘j’-th row of Φ, i-th column of Ψ
Incoherent sampling: Example (1)
which corresponds to the simplest and most conventional sampling basis in space or time (i.e. Φ = identity matrix) .
43
Incoherent sampling: Example (2)�or Randomness is cool! ☺
44
Signal Reconstruction
45
This is the L0 norm of a vector = the number of non-zero elements in it
Signal Reconstruction
46
This is the L1 norm of a vector = the sum total of the absolute values of all its elements
Signal Reconstruction: uniqueness issues
47
Signal Reconstruction: uniqueness issues
48
Signal Reconstruction: uniqueness issues
49
Why is BP efficiently solvable?
50
51
This is an equivalent formulation of BP and it is a linear program. But why does the new formulation guarantee that the estimated α and β obey our constraint, i.e. the estimated α is non-zero only at the positive entries of θ and the estimated β is non-zero only at the negative entries of θ? See next slide for the answer.
52
Signal Reconstruction
53
Theorem 1
54
Ref: Donoho, “Compressed Sensing”, IEEE Transactions on Information Theory, April 2006.
Ref: Candes, Romberg and Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information”, IEEE Transactions on Information Theory, Feb 2006.
Comments on Theorem 1
55
Comments on Theorem 1
56
Comments on Theorem 1: A New Sampling Theorem?
which corresponds to the simplest sampling basis in space or time.
we can get an exact reconstruction of f with high probability.
57
Comments on Theorem 1: A New Sampling Theorem?
58
Comments on Theorem 1: A New Sampling Theorem?
59
“signal support” is a technical term = set of signal indices with non-zero values.
Intuition behind incoherence
60
This signal of n elements is sparse in the time domain. If you draw m < n samples in time, most of them will most likely be zero. Instead, the sampling or sensing vectors (i.e. rows of the sensing matrix Ф) should be pseudo-random and incoherent patterns as shown on the right.
Intuition behind incoherence
61
Intuition behind incoherence
62
Intuition behind incoherence
63
New concept: Restricted Isometry Property (RIP)
and that obviously does not preserve the squared magnitude of the vector θ).
64
(*) Without loss of generality, we will assume that the rows of A have unit magnitude
Restricted Isometry Property
65
Restricted Isometry Property (RIP)
66
Theorem 2
Then we have:
67
θS is created by retaining the S largest magnitude elements of θ in their correct locations, and setting the rest to 0.
Comments on Theorem 2
68
Comments on Theorem 2
69
Compressive Sensing under Noise
70
Convex problem, called as second-order cone program. Can be solved efficiently with standard packages including MATLAB and L1-MAGIC
Theorem 3
Then we have:
71
θS is created by retaining the S largest magnitude elements of θ in their correct locations, and setting the rest to 0.
Comments on Theorem 3
72
Comments on Theorem 3
73
Comments on Theorem 3
74
Comments on Theorem 3
75
Randomness is super-cool!☺
76
Randomness is super-super-cool!☺
77
Randomness is super-cool!☺
78
M. Rudelson and R. Vershynin, “On sparse reconstruction from Fourier and Gaussian measurements,” Commun. Pure Appl. Math., vol. 61, no. 8, pp. 1025–1045, Aug. 2008
L1 norm and L0 norm
79
The L1 norm is a “softer” version of the L0 norm. Other Lp-norms where 0 < p < 1 are possible and impose a stronger form of sparsity, but they lead to non-convex problems. Hence L1 is preferred.
L1 is good, L2 is bad. Why?
80
Unit Lp-circle (defined below) for different values of p. Left side, top to bottom, p = 1 (square with diagonals coinciding with Cartesian axes), p = 2 (circle in the conventional sense), p = infinity (square with sides parallel to the axes); right side, p = 2/3 (astroid).
In 2-D
In d-dimensions
L1 is good, L2 is bad. Why?
81
Image source: Romberg, “Imaging via compressive sampling”, IEEE Signal Processing Magazine
α1
α2
α1
α2
L1 is good, L2 is bad. Why?
82
α1
α2
α1
α2
83
Another example – see the L1 ball versus the L2 ball – both in 3D. In this case the constraint is a plane, given by y = Φx where y is scalar, x is a vector of 3 elements, and Φ has size 1 x 3.
L1 is good, L2 is bad. Why?
84
Compressive Sensing: Toy Example with images
85
Compressive Sensing: Toy Example with images
86
Compressive Sensing: Toy Example with images
87
Better edge preservation with the randomly assembled Φ – which actually uses some high frequency information (unlike conventional DCT which discards higher frequencies). Higher PSNR values with the randomly assembled sensing matrix as compared to the linear DCT-based method.
Image source: Romberg, “Imaging via compressive sampling”, IEEE Signal Processing Magazine
Compressed Sensing for Piecewise Constant Signals
88
This optimization problem is called a second order cone program. It can be solved efficiently using packages like L1-magic. We will not go into more details, but the documentation of L1-magic is a good starting point for the curious readers.
Theorem 4 (similar to Theorem 1)
If (C is some constant) then the solution to the following is exact with probability 1−δ:
89
Ref: Candes, Romberg and Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information”, IEEE Transactions on Information Theory, Feb 2006.
Experiment by Candes, Romberg and Tao
90
Ref: Candes, Romberg and Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information”, IEEE Transactions on Information Theory, Feb 2006.
Relevant theorem behind Candes’ experiment
91
Relevant theorem behind Candes’ experiment
92
Experiment by Candes, Romberg and Tao
93
Experiment by Candes, Romberg and Tao
94
Experiment by Candes, Romberg and Tao
95
Mutual coherence and Restricted Isometry Property�
96
Concept of mutual coherence
97
Theorem 5
98
T. Cai, L. Wang, G. Xu, Stable recovery of sparse signals and an oracle inequality, IEEE Trans. Inform. Theory, 56 (7) (2010)
Theorem 5
99
Theorem 3 versus Theorem 5
100
RIC of A (of order S)
Theorem 3 versus Theorem 5
101
A subset (of maximum size S) of possible non-zero indices of vector θ
These indicate the maximum and minimum eigenvalues of the matrix (AΓ)T(AΓ) over all sets Г. In general, the maximum eigenvalue of a symmetric matrix BTB is given by . The vector x that realizes the maximum value is the corresponding eigenvector. Likewise for minimum eigenvalue.
Theorem 3 versus Theorem 5
102
Theorem 3 versus Theorem 5
103
Theorem 3 versus Theorem 5
104
Theorem 3 versus Theorem 5
105
Theorem 3 versus Theorem 5
106
Some history: Observations by Logan (1965)
107
Ref: Logan, “Properties of high-pass signals”, PhD Thesis, Columbia University, 1965
Some history: Observations by Logan (1965)
108
Summary
109