1 of 35

Compressive Sensing: Reconstruction Algorithms

CS 754, Ajit Rajwade

2 of 35

Compressive sensing: reconstruction problem

The basic problem is:

Here y is a vector of measurements, as obtained with an instrument with sensing matrix Φ ϵ R^mxn(m << n).
The original signal x (to be estimated) has a sparse/compressible representation in the orthonormal basis Ψ ϵ R^nxn, i.e. x = Ψθ, i.e. θ is a sparse/compressible vector.

3 of 35

Compressive Reconstruction Algorithms

Category 1: Problem P0 is NP-hard. So instead, run an approximation algorithm which can be efficiently solved.

Category 2: Seek to directly solve problem P1. There are many algorithms in this category, in particular a popular one called Iterative Shrinkage/Thresholding Algorithm (ISTA).

4 of 35

Compressive Reconstruction Algorithms

We will first focus on some algorithms in category 1.

Examples:
Matching pursuit (MP)
Orthogonal matching pursuit (OMP)
Iterative Hard Thresholding
Co-SAMP.

5 of 35

Matching Pursuit

One of the simplest approximation algorithms to obtain the coefficients θ of a signal y in an over-complete “dictionary” matrix A ϵ R^mxn(i.e. m << n)
Developed by Mallat and Zhang in 1993 (ref: S. G. Mallat and Z. Zhang, Matching Pursuits with Time-Frequency Dictionaries, IEEE Transactions on Signal Processing, December 1993)
Based on successively choosing the column vector in A which has maximal dot product with a so-called residual vector (initialized to y in the beginning).

6 of 35

Pseudo-code

“j” or “l” is an index for columns of A

Select the column of A that is maximally correlated (highest dot-product) with the residual

MP may choose a single column vector of A (say Aj) in multiple different iterations, and each time the coefficient theta_j may be different. In such cases, all these different theta_j values will contribute towards the reconstruction of y.

7 of 35

Properties of matching pursuit

MP decomposes any signal y into the form

where a_k is a column from A, and r^-k is a residual vector.

Note that r^-k and a_k are orthogonal.

Hence we have:

8 of 35

Properties of matching pursuit

We want residuals with low magnitudes and hence the choice of the dictionary column with maximal dot product.

The residual squared magnitude decreases across iterations.

9 of 35

Orthogonal Matching Pursuit (OMP)

More sophisticated algorithm as compared to matching pursuit (MP).
MP may pick the same dictionary column multiple times (why?) and hence it is inefficient.
In OMP, the signal is approximated by successive projection onto those dictionary columns (i.e. columns of A) that are associated with a current “support set”.
The support set is also successively updated.

10 of 35

Pseudo-code

Sub-matrix of A containing only those columns which lie in the support set T⁽ⁱ⁾

Several coefficients are re-computed in each iteration

Support set

Pseudo-inverse

11 of 35

OMP versus MP

Unlike MP, in OMP, the residual at an iteration is always orthogonal to all currently selected column vectors of A (proof next slide).

Therefore (unlike MP) OMP never re-selects any column vector (why?).

OMP is costlier per iteration (due to pseudo-inverse).

OMP always gives the optimal approximation w.r.t. the selected subset of the dictionary (note: this does not mean that the selected subset itself was optimal).

In order for the pseudo-inverse to be well-defined, the number of OMP iterations should not be more than m.

12 of 35

OMP residuals

13 of 35

OMP for noisy signals

For non-noisy signals, OMP is run with ε = 0, i.e. until the magnitude of the residual is zero.

If there is noise, then ε should be set based on the noise variance.

14 of 35

OMP: error bounds

Various error bounds on the performance of OMP have been analyzed.
Example the following theorem due to Tropp and Gilbert (2007): Let δ ϵ (0,0.36), m >= Cslog(n/ δ). Given m measurements of the S-sparse n-dimensional signal x taken with a standard Gaussian matrix of size m x n, we can recover x exactly with probability more than 1-2δ. The constant C turns out to be <= 20.

15 of 35

Experiment on OMP

Simulation of compressive sensing with a Gaussian random measurement matrix of size m x n
Data: patches of size 8 x 8 (n = 64) from the Barbara image, with addition of Gaussian noise with mean 0 and sigma = 0.05 x mean intensity of coded patch
m varied from 0.1n to 0.7n.

16 of 35

Original barbara image

Reconstruction results with m = fn where f = 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1 in column-wise order

17 of 35

Iterative Shrinkage and Thresholding Algorithm (ISTA)

18 of 35

Optimization algorithm

There are many algorithms for solving the optimization problem below.

Denoising: A = U

Deblurring: A = HU,

H = circulant/block circulant matrix derived from blur kernel

19 of 35

Iterative Shrinkage and Thresholding Algorithm (ISTA)

An iterative algorithm to solve:

Useful in compressive reconstructions, but also used in various image restoration problems such as deblurring.
Can be implemented in 4 lines of MATLAB code.

20 of 35

ISTA

Directly solving for θ in closed form is not possible due to the L1-norm term, so we resort to iterative solutions.

ISTA seeks to solve this difficult minimization problem by a series of smaller minimization problems.

Given an initial guess θ_k, we seek to find a new vector θ_k+1 such that J(θ_k+1) < J(θ_k).

21 of 35

ISTA

The new vector θ_k+1 is chosen to minimize a so-called majorizer function M_k(θ) such that M_k(θ) >= J(θ) for all θ, and M_k(θ_k) = J(θ_k).

The majorizer function will be different at each iteration (hence the subscript k).

Image source: tutorial by Ivan Selesnick

22 of 35

ISTA

Overall algorithm is as follows:
For k = 0, initialize θ₀.
Choose majorizer M_k(θ) such that M_k(θ) >= J(θ) for all θ, and M_k(θ_k) = J(θ_k).
Select θ_k+1 to minimize the majorizer M_k(θ).
Set k = k+1. Go to step 2.

Note

J(θ_k+1) <= M_k(θ_k+1) by definition of M_k

< M_k(θ_k) as θ_k+1minimizes M_k

= J(θ_k) by definition of M_k

Hence J(θ_k+1) < J(θ_k)

23 of 35

ISTA

Suppose we want to solve:

The intuitive solution is:

Humongous matrix – very hard to store in memory – forget about inverting it! ☹

24 of 35

ISTA

Suppose we want to solve:

One possible majorizer is:

25 of 35

ISTA

One possible majorizer is:

26 of 35

ISTA

One possible majorizer is:

The minimizer is given as:

Notice: this yields us an iterative solver that does not require inversion of A^TA which is a very large matrix

27 of 35

ISTA

Now consider:

The majorizer is now:

28 of 35

ISTA

The majorizer is now:

The minimizer is given as:

29 of 35

ISTA

Consider the following equation in x

Its solution is given as

30 of 35

ISTA: explaining soft thresholding

When x is positive, y = x+λ. When y > λ, x = y- λ.
When x is negative, y = x-λ. When y < -λ, x = y+ λ.
When x is zero, one has to refer to the sub-differential of the L₁ norm at x = 0, which is [-1,1].
As per optimality conditions, we must have 0 ϵ x – y + λ[-1,1], i.e. 0 ϵ – y + λ[-1,1], i.e. |y|≤ λ.

31 of 35

ISTA: explaining soft thresholding – explaining subdifferential

The sub-derivative of a convex function f at point x₀ in open interval I is a real number c such that

The set of sub-derivatives in the closed interval [a,b] is called the sub-differential where we have:

32 of 35

ISTA

The minimizer is given as:

33 of 35

ISTA: Algorithm

4-5 lines of MATLAB code

34 of 35

Signal x

Signal y = h*x + noise

Estimate of signal x

Image source: tutorial by Ivan Selesnick

35 of 35

i.e. solution with Laplacian prior

Image source: tutorial by Ivan Selesnick