1 of 35

Compressive Sensing: Reconstruction Algorithms

CS 754, Ajit Rajwade

1

2 of 35

Compressive sensing: reconstruction problem

  • The basic problem is:

  • Here y is a vector of measurements, as obtained with an instrument with sensing matrix Φ ϵ Rmxn(m << n).
  • The original signal x (to be estimated) has a sparse/compressible representation in the orthonormal basis Ψ ϵ Rnxn, i.e. x = Ψθ, i.e. θ is a sparse/compressible vector.

2

3 of 35

Compressive Reconstruction Algorithms

  • Category 1: Problem P0 is NP-hard. So instead, run an approximation algorithm which can be efficiently solved.

  • Category 2: Seek to directly solve problem P1. There are many algorithms in this category, in particular a popular one called Iterative Shrinkage/Thresholding Algorithm (ISTA).

3

4 of 35

Compressive Reconstruction Algorithms

  • We will first focus on some algorithms in category 1.

  • Examples:
  • Matching pursuit (MP)
  • Orthogonal matching pursuit (OMP)
  • Iterative Hard Thresholding
  • Co-SAMP.

4

5 of 35

Matching Pursuit

  • One of the simplest approximation algorithms to obtain the coefficients θ of a signal y in an over-complete “dictionary” matrix A ϵ Rmxn (i.e. m << n)
  • Developed by Mallat and Zhang in 1993 (ref: S. G. Mallat and Z. Zhang, Matching Pursuits with Time-Frequency Dictionaries, IEEE Transactions on Signal Processing, December 1993)
  • Based on successively choosing the column vector in A which has maximal dot product with a so-called residual vector (initialized to y in the beginning).

5

6 of 35

Pseudo-code

6

“j” or “l” is an index for columns of A

Select the column of A that is maximally correlated (highest dot-product) with the residual

MP may choose a single column vector of A (say Aj) in multiple different iterations, and each time the coefficient theta_j may be different. In such cases, all these different theta_j values will contribute towards the reconstruction of y.

7 of 35

Properties of matching pursuit

  • MP decomposes any signal y into the form

where ak is a column from A, and r-k is a residual vector.

  • Note that r-k and ak are orthogonal.

  • Hence we have:

7

8 of 35

Properties of matching pursuit

  • We want residuals with low magnitudes and hence the choice of the dictionary column with maximal dot product.

  • The residual squared magnitude decreases across iterations.

8

9 of 35

Orthogonal Matching Pursuit (OMP)

  • More sophisticated algorithm as compared to matching pursuit (MP).
  • MP may pick the same dictionary column multiple times (why?) and hence it is inefficient.
  • In OMP, the signal is approximated by successive projection onto those dictionary columns (i.e. columns of A) that are associated with a current “support set”.
  • The support set is also successively updated.

9

10 of 35

Pseudo-code

10

Sub-matrix of A containing only those columns which lie in the support set T(i)

Several coefficients are re-computed in each iteration

Support set

Pseudo-inverse

11 of 35

OMP versus MP

  • Unlike MP, in OMP, the residual at an iteration is always orthogonal to all currently selected column vectors of A (proof next slide).

  • Therefore (unlike MP) OMP never re-selects any column vector (why?).

  • OMP is costlier per iteration (due to pseudo-inverse).

  • OMP always gives the optimal approximation w.r.t. the selected subset of the dictionary (note: this does not mean that the selected subset itself was optimal).

  • In order for the pseudo-inverse to be well-defined, the number of OMP iterations should not be more than m.

11

12 of 35

OMP residuals

12

13 of 35

OMP for noisy signals

  • For non-noisy signals, OMP is run with ε = 0, i.e. until the magnitude of the residual is zero.

  • If there is noise, then ε should be set based on the noise variance.

13

14 of 35

OMP: error bounds

  • Various error bounds on the performance of OMP have been analyzed.
  • Example the following theorem due to Tropp and Gilbert (2007): Let δ ϵ (0,0.36), m >= Cslog(n/ δ). Given m measurements of the S-sparse n-dimensional signal x taken with a standard Gaussian matrix of size m x n, we can recover x exactly with probability more than 1-2δ. The constant C turns out to be <= 20.

14

15 of 35

Experiment on OMP

  • Simulation of compressive sensing with a Gaussian random measurement matrix of size m x n
  • Data: patches of size 8 x 8 (n = 64) from the Barbara image, with addition of Gaussian noise with mean 0 and sigma = 0.05 x mean intensity of coded patch
  • m varied from 0.1n to 0.7n.

15

16 of 35

16

Original barbara image

Reconstruction results with m = fn where f = 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1 in column-wise order

17 of 35

Iterative Shrinkage and Thresholding Algorithm (ISTA)

17

18 of 35

Optimization algorithm

  • There are many algorithms for solving the optimization problem below.

18

Denoising: A = U

Deblurring: A = HU,

H = circulant/block circulant matrix derived from blur kernel

19 of 35

Iterative Shrinkage and Thresholding Algorithm (ISTA)

  • An iterative algorithm to solve:

  • Useful in compressive reconstructions, but also used in various image restoration problems such as deblurring.
  • Can be implemented in 4 lines of MATLAB code.

19

20 of 35

ISTA

  • Directly solving for θ in closed form is not possible due to the L1-norm term, so we resort to iterative solutions.

  • ISTA seeks to solve this difficult minimization problem by a series of smaller minimization problems.

  • Given an initial guess θk, we seek to find a new vector θk+1 such that J(θk+1) < J(θk).

20

21 of 35

ISTA

  • The new vector θk+1 is chosen to minimize a so-called majorizer function Mk(θ) such that Mk(θ) >= J(θ) for all θ, and Mk(θk) = J(θk).

  • The majorizer function will be different at each iteration (hence the subscript k).

21

Image source: tutorial by Ivan Selesnick

22 of 35

ISTA

  • Overall algorithm is as follows:
  • For k = 0, initialize θ0.
  • Choose majorizer Mk(θ) such that Mk(θ) >= J(θ) for all θ, and Mk(θk) = J(θk).
  • Select θk+1 to minimize the majorizer Mk (θ).
  • Set k = k+1. Go to step 2.

22

Note

J(θk+1) <= Mk(θk+1) by definition of Mk

< Mk(θk) as θk+1 minimizes Mk

= J(θk) by definition of Mk

Hence J(θk+1) < J(θk)

23 of 35

ISTA

  • Suppose we want to solve:

  • The intuitive solution is:

23

Humongous matrix – very hard to store in memory – forget about inverting it! ☹

24 of 35

ISTA

  • Suppose we want to solve:

  • One possible majorizer is:

24

25 of 35

ISTA

  • One possible majorizer is:

25

26 of 35

ISTA

  • One possible majorizer is:

  • The minimizer is given as:

26

Notice: this yields us an iterative solver that does not require inversion of ATA which is a very large matrix

27 of 35

ISTA

  • Now consider:

  • The majorizer is now:

27

28 of 35

ISTA

  • The majorizer is now:

  • The minimizer is given as:

28

29 of 35

ISTA

  • Consider the following equation in x

  • Its solution is given as

29

30 of 35

ISTA: explaining soft thresholding

  • When x is positive, y = x+λ. When y > λ, x = y- λ.
  • When x is negative, y = x-λ. When y < -λ, x = y+ λ.
  • When x is zero, one has to refer to the sub-differential of the L1 norm at x = 0, which is [-1,1].
  • As per optimality conditions, we must have 0 ϵ xy + λ[-1,1], i.e. 0 ϵ – y + λ[-1,1], i.e. |y|≤ λ.

30

31 of 35

ISTA: explaining soft thresholding – explaining subdifferential

  • The sub-derivative of a convex function f at point x0 in open interval I is a real number c such that

  • The set of sub-derivatives in the closed interval [a,b] is called the sub-differential where we have:

31

32 of 35

ISTA

  • The minimizer is given as:

32

33 of 35

ISTA: Algorithm

33

4-5 lines of MATLAB code

34 of 35

34

Signal x

Signal y = h*x + noise

Estimate of signal x

Image source: tutorial by Ivan Selesnick

35 of 35

35

i.e. solution with Laplacian prior

Image source: tutorial by Ivan Selesnick