1 of 106

Matrix Completion and Recovery

CS 754 Lecture Notes

1

2 of 106

Matrix Completion in Practice: Scenario 1

  • Consider a survey of m people where each is asked q questions.
  • It may not be possible to ask each person all q questions.
  • Consider a matrix of size m by q (each row is the set of questions asked to any given person).
  • This matrix is only partially filled (many missing entries).
  • Is it possible to infer the full matrix given just the recorded entries?

2

3 of 106

Matrix Completion in Practice: Scenario 2

  • Some online shopping sites such as Amazon, Flipkart, Ebay, Netflix etc. have recommender systems.
  • These websites collect product ratings from users (especially Netflix).
  • Based on user ratings, these websites try to recommend other products/movies to the user that he/she will like with a high probability.
  • Consider a matrix with the number of rows equal to the number of users, and number of columns equal to the number of movies/products.
  • This matrix will be HIGHLY incomplete (no user has the patience to rate too many movies!!) – maybe only 5% of the entries will be filled up.
  • Can the recommender system infer user preferences from just the defined entries?

3

4 of 106

Matrix Completion in Practice: Scenario 2

  • Read about the Netflix Prize to design a better recommender system:

http://en.wikipedia.org/wiki/Netflix_Prize

4

5 of 106

Matrix Completion in Practice: Scenario 3

  • Consider an image or a video with several pixel values missing.
  • This is not uncommon in range imagery or remote sensing applications!
  • Consider a matrix whose each column is a (vectorized) patch of m pixels. Let the number of columns be K.
  • This m by K matrix will have many missing entries.
  • Is it possible to infer the complete matrix given just the defined pixel values?
  • If the answer were yes, note the implications for image compression!

5

6 of 106

Matrix Completion in Practice: Scenario 4

  • Consider a long video sequence of F frames.
  • Suppose I mark out m salient (interesting points) {Pi}, 1<=i<=m, in the first frame.
  • And track those points in all subsequent frames.
  • Consider a matrix Μ of size m x 2F where row j contains the X and Y coordinates of the tracked points corresponding to initial point Pj (in each of the F frames).
  • Unfortunately, many salient points may not be trackable due to occlusion or errors from the tracking algorithms.
  • So Μ is highly incomplete.
  • Is it possible to infer the true matrix from only the available measurements?

6

7 of 106

  • Given a set of images of a person’s face under L lighting conditions, create a matrix of size N x L where N is the number of pixels in an image.
  • We are assuming that the person’s head pose does not change and that there are no shadows.
  • It turns out that the rank of such a matrix is 3.
  • For that we need to understand the so called Lambertian reflectance model in computer vision or computer graphics.

7

Matrix Completion in Practice: Scenario 5

8 of 106

Lambertian Model

  • The scene radiance is given by the equation:

  • What if cos(Ѳ) < 0? This means that the surface does not face the light source. In such a case, set the irradiance to 0.

8

The term in the circle(s) is the effective surface area as seen from the light source

9 of 106

Many images of one person under different lighting directions: rank 3 matrix

  • Consider

9

If Lambertian model is satisfied!

M = number of pixels of the form (x,y)

m = number of images, each under a different lighting direction

Assumptions: no self-shadows or specularities

10 of 106

A property of these matrices

  • Scenario 1: Many people will tend to give very similar or identical answers to many survey questions.
  • Scenario 2: Many people will have similar preferences for movies (only a few factors affect user choices).
  • Scenario 3: Non-local self-similarity!
  • Scenario 4 and 5: Geometric constraints!
  • This makes the matrices in all these scenarios (approximately) low in rank!

10

11 of 106

Scenario 6: Low rank matrices!

  • Consider matrix D whose entry Dij = euclidean distance between points pi and pj in some k-dimensional space.
  • Such a matrix has rank at the most k+2.
  • Proof:

11

Column vector

12 of 106

A property of these matrices

  • Scenario 4: The true matrix underlying Μ in question has been PROVED to be of low rank (in fact, rank 3) under orthographic projection and rigid motion (ref: Tomasi and Kanade, “Shape and Motion from Image Streams Under Orthography: a Factorization Method”, IJCV 1992) and a few other more complex camera models (up to rank 9).
  • In case of orthographic projection with rigid motion, Μ can be expressed as a product of two matrices – a rotation matrix of size 2F x 3, and a shape matrix of size 3 x P. Hence it has rank 3.
  • Μ is useful for many computer vision problems such as structure from motion, motion segmentation and multi-frame point correspondences.

12

13 of 106

Low-rank matrices are cool!

  • The answer to the four questions/scenarios is a NO in the general case.

  • But it’s a big YES if we assume that the underlying matrix has low rank (and which, as we have seen, is indeed the case for all four scenarios).

  • Low rank matrices of size n1n2 with rank r have only (n1+n2-r)r degrees of freedom – much less than n1n2 when r is small.

13

14 of 106

Low-rank matrices are cool!

  • Low rank matrices of size n1n2 with rank r have only (n1+n2-r)r degrees of freedom – much less than n1n2 when r is small.

  • #DOF for left singular vectors = n1-1 for the first vector (as it is a unit vector) + n1-2 for the second one (as it is a unit vector and orthogonal to the first one) + … + n1-r (it’s a unit vector and orthogonal to the earlier r-1 singular vectors) = rn1-r(r+1)/2
  • #DOF for right singular vectors = rn2-r(r+1)/2
  • #DOF for singular values = r
  • Total = rn1+rn2-r2

14

15 of 106

Theorem 1 (Informal Statement)

  • Consider an unknown matrix Μ of size n1 by n2 having rank r < min(n1, n2).
  • Suppose we observe only a fraction of entries of Μ in the form of matrix Γ, where Γ(i,j) = Μ(i,j) for all (i,j) belonging to some set Ω and Γ(i,j) undefined elsewhere.
  • If (1) Μ has row and column spaces that are “sufficiently incoherent” with the canonical basis (i.e. identity matrix), (2) r is “sufficiently small”, and (3) Ω is “sufficiently large”, then we can accurately recover Μ from Γ by solving the following rank minimization problem:

15

16 of 106

Cool theorem, but … ☹

  • The afore-mentioned optimization problem is NP-hard (in fact, current algorithms for solving it are known to have double exponential complexity in n1 or n2!)

16

17 of 106

Theorem 2 (Informal Statement)

  • Consider an unknown matrix Μ of size n1 by n2 having rank r < min(n1, n2).
  • Suppose we observe only a fraction of entries of Μ in the form of matrix Γ, where Γ(i,j) = Μ(i,j) for all (i,j) belonging to some set Ω and Γ(i,j) undefined elsewhere.
  • If (1) Μ has row and column spaces that are “sufficiently incoherent” with the canonical basis (i.e. identity matrix), (2) r is “sufficiently small”, and (3) Ω is “sufficiently large”, then we can accurately recover Μ from Γ by solving the following “trace-norm” minimization problem:

17

E. Candes and B. Recht, “Exact matrix completion by convex optimization”, Found. Of Comp. Math., 2009

18 of 106

What is the trace-norm of a matrix?

  • The trace-norm of a matrix is the sum of its singular values.
  • It is also called nuclear norm.
  • It is a softened version of the rank of a matrix, just like the L1-norm of a vector is a softened version of the L0-norm of the vector.
  • Minimization of the trace-norm (even under the given constraints) is a convex optimization problem and can be solved efficiently (no local minima issues).
  • This is similar to the L1-norm optimization (in compressive sensing) being efficiently solvable.

18

19 of 106

More about trace-norm minimization

  • The efficient trace-norm minimization procedure is provably known to give the EXACT SAME result as the NP-hard rank minimization problem (under the same constraints and same conditions on the unknown matrix Μ and the sampling set Ω).
  • This is analogous to the case where L1-norm optimization yielded the same result as L0-norm optimization (under the same set of constraints and conditions).
  • Henceforth we will concentrate only on Theorem 2 (and beyond).

19

20 of 106

The devil is in the details

  • Beware: Not all low-rank matrices can be recovered from partial measurements!
  • Example consider a matrix containing zeroes everywhere except the top-right corner.
  • This matrix is low rank, but it cannot be recovered from knowledge of only a fraction of its entries!
  • Many other such examples exist.
  • In reality, Theorems 1 and 2 work for low-rank matrices whose singular vectors are sufficiently spread out, i.e. sufficiently incoherent with the canonical basis (i.e. with the identity matrix).

20

21 of 106

Coherence of a subspace

  • The coherence of subspace U of Rn of dimension r with respect to the canonical basis {ei} is defined as:

  • Here PU is the orthogonal projection onto U. It is given by PU = U(UTU)-1UT. This equals UUT if U is an orthonormal basis. Note: U has size n x r.

21

22 of 106

Coherence of a basis

  • We are interested in matrices whose row and column subspaces (i.e. the left and right singular vectors respectively) have low coherence w.r.t. canonical basis.

  • Why? Because such matrices would not lie in the null-space of the sampling operator.

  • Remember: the sampling operator is a row-subsampled version of the identity matrix.

22

23 of 106

Formal definition of key assumptions

  • Consider an underlying matrix M of size n1 by n2. Let the SVD of M be given as follows:

  • We make the following assumptions about M:
  • (A0)
  • (A1) The max. abs. value entry in the n1 by n2 matrix is upper bounded by

23

In absolute value

24 of 106

Theorem 2 (Formal Statement)

24

the trace-norm minimizer (in the informal statement of theorem 2)

Candes and Recht , Exact matrix completion via convex optimization, https://statweb.stanford.edu/~candes/papers/MatrixCompletion.pdf

25 of 106

Comments on Theorem 2

  • Theorem 2 states that a larger number of entries of M (denoted by m) must be known for accurate reconstruction if (1) M has larger rank r, (2) greater value of μ0 in (A0), (3) greater value of μ1 in (A1).
  • Example: If μ0 = O(1) and the rank r is small, the reconstruction is accurate with high probability provided for constant C.

25

26 of 106

Comments on Theorem 2

  • It turns out that if the singular vectors of matrix M have bounded values, the condition (A1) almost always holds for the value μ1 = O(log n).

  • The probabilistic nature of success of the recovery is because the entries in Ω are from a uniform distribution across all matrix (row,column) indices. Recovery is successful for most such subsets of Ω.

26

27 of 106

Comments on Theorem 2

  • Note that Ω cannot afford to completely miss any row or any column of M – otherwise recovery of even a rank-1 matrix is not possible.

  • Thus one needs at least one entry from every row and every column – and hence one needs at least n log n samples for this to happen.

  • The latter is an example of the coupon collectors problem – see https://en.wikipedia.org/wiki/Coupon_collector%27s_problem . This shows that the bound on m is not too conservative.

27

28 of 106

Segway: Coupon Collector’s problem

  • This problem is as follows: Consider n unique coupons, which you pick with replacement.

  • How many coupons do you need to draw with replacement to ensure that each coupon was drawn at least once?

  • The expected number of such trials is known to be O(n log n).

  • One can further show that the probability that the number of such trials exceeds cn log n (for some constant c) is upper bounded by n-c+1.

28

29 of 106

Another version of Theorem 2 = Theorem 2A

  • Let M be a n1 x n2 matrix of rank r obeying assumption (A2) below, of which only m entries are observed with locations chosen uniformly at random. Consider m ≥ βC(μB)4n log2n where C is a constant and n = max(n1,n2). Then M is the unique solution to Q0 with probability 1-1/nβ. Here assumption (A2) is that:

29

Candes and Plan, Matrix completion under noise, Proceedings of the IEEE

30 of 106

Matrix Completion under noise

  • Consider an unknown matrix Μ of size n1 by n2 having rank r < min(n1, n2).
  • Suppose we observe only a fraction of entries of Μ in the form of matrix Γ, where Γ(i,j) = Μ(i,j) + Z(i,j) for all (i,j) belonging to some set Ω and Γ(i,j) undefined elsewhere.
  • Here Z refers to a white noise process which obeys the constraint that:

30

31 of 106

Matrix Completion under noise

  • In such cases, the unknown matrix Μ can be recovered by solving the following minimization procedure (called as a semi-definite program):

31

32 of 106

Theorem 3

  • Under the assumptions for Theorems 1,2,2A, the reconstruction result from (Q1) is accurate with an error bound given by:

32

Candes and Plan, Matrix completion under noise, Proceedings of the IEEE

33 of 106

Numerical results: Exact matrix completion

  • Test data (low rank matrices of rank r) generated synthetically using product of two random matrices of rank r each.
  • Sample subset of entries uniformly at random.
  • Estimation of underlying low-rank matrix using CVX.
  • Study of variation of reconstruction error w.r.t. m and number of degrees of freedom (=r(2n-r) for a rank-r matrix).
  • Recorded: probability of success in recovery across 50 trials.

33

34 of 106

34

Candes and Recht , Exact matrix completion via convex optimization, https://statweb.stanford.edu/~candes/papers/MatrixCompletion.pdf

Recall for n x n matrices, dr = 2rn + r2.

35 of 106

35

Candes and Recht , Exact matrix completion via convex optimization, https://statweb.stanford.edu/~candes/papers/MatrixCompletion.pdf

36 of 106

36

Candes and Recht , Exact matrix completion via convex optimization, https://statweb.stanford.edu/~candes/papers/MatrixCompletion.pdf

This is actual a compressive low rank matrix recovery experiment, not a matrix completion experiment. See the section on (Compressive) Low Rank Matrix Recovery later on in these slides.

37 of 106

Numerical results: noisy matrix completion

  • Test matrices created by product of two Gaussian random matrices of rank r.
  • Sampling set Ω picked uniformly at random.
  • Observations corrupted by Gaussian noise with mean 0 and σ = 1.
  • Low rank matrix reconstructed as solution to:

37

38 of 106

Numerical results: noisy matrix completion

38

Red curve: proportional to upper bound on reconstruction error as per the theorem;

Oracle error in green curve: reconstruction error using least squares assuming that the matrix was of rank r

39 of 106

Numerical results: noisy matrix completion

39

40 of 106

A Minimization Algorithm

  • Consider the minimization problem:

  • There are many techniques to solve this problem (http://perception.csl.illinois.edu/matrix-rank/sample_code.html)
  • Out of these, we will study one method called “singular value thresholding”.

40

41 of 106

Singular Value Thresholding (SVT)

41

Ref: Cai et al, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 2010.

The soft-thresholding procedure obeys the following property (which we state w/o proof).

42 of 106

Properties of SVT (stated w/o proof)

  • The sequence {Φk} converges to the true solution of the problem below provided the step-sizes {δk} all lie between 0 and 2.

  • For large values of τ, this converges to the solution of the original problem (i.e. without the Frobenius norm term).

42

43 of 106

Properties of SVT (stated w/o proof)

  • The matrices {Φk} turn out to have low rank (empirical observation – proof not established).
  • The matrices {Yk} also turn out to be sparse (empirical observation – rigorous proof not established).
  • The SVT step does not require computation of full SVD – we need only those singular vectors whose singular values exceed τ. There are special iterative methods for that.

43

44 of 106

Results

  • The SVT algorithm works very efficiently and is easily implementable in MATLAB.

  • The authors report reconstruction of a 30,000 by 30,000 matrix in just 17 minutes on a 1.86 GHz dual-core desktop with 3 GB RAM and with MATLAB’s multithreading option enabled.

44

45 of 106

Results (Data without noise)

45

46 of 106

Results (Noisy Data)

46

47 of 106

Results on real data

  • Dataset consists of a matrix M of geodesic distances between 312 cities in the USA/Canada.
  • This matrix is of approximately low-rank (in fact, the relative Frobenius error between M and its rank-3 approximation is 0.1159).
  • 70% of the entries of this matrix (chosen uniformly at random) were blanked out.

47

48 of 106

Results on real data

  • The underlying matrix was estimated using SVT.
  • In just a few seconds and a few iterations, the SVT produces an estimate that is as accurate as the best rank-3 approximation of M.

48

49 of 106

Results on real data

49

50 of 106

Robust Principal Components Analysis

50

51 of 106

Basic Question

  • Consider a matrix M of size n1 x n2 that is the sum of two components – L (a low-rank component) and S (a component with sparse but unknown support).
  • Can we recover L and S given only M?
  • Note: support of S means all those matrix entries (i,j) where S(i,j) ≠ 0.

51

52 of 106

Why ask this question? Scenario 1

  • Consider a video taken with a static camera (like at airports or on overhead bridges on expressways).
  • Such videos contain a “background” portion that is static or changes infrequently, and a moving portion called “foreground” (people, cars etc.).
  • In many applications (esp. surveillance for security or traffic monitoring), we need to detect and track the moving foreground.

52

53 of 106

Why ask this question? Scenario 1

  • Let us represent this video as a N x T matrix, where the number of columns T equals the number of frames in the video, and where each column containing a single video-frame of N = N1N2 pixels converted to a vector form.
  • This matrix can clearly be expressed as the sum of a low-rank background matrix and a sparse foreground matrix.

53

54 of 106

Why ask this question? Scenario 2

  • It is well-known that the images of a convex Lambertian object taken from the same viewpoint, but under K different lighting directions, span a low-dimensional space (ref: Basri and Jacobs, “Lambertian reflectance and linear subspaces”, IEEE TPAMI, 2003).
  • This means that a matrix M of size N x K (each column of which is an image captured under one of the K lighting directions and represented in vectorized form) has a low approximate rank – in fact at most rank 9.

54

55 of 106

Why ask this question? Scenario 2

  • But many objects of interest (such as human faces) are neither convex nor exactly Lambertian.
  • Effects such as shadows, specularities or saturation cause violation of the low-rank assumption.
  • But these errors have sparse spatial support, even though their magnitudes are large.
  • Here again, M can be represented as a low-rank background and a sparse foreground.

55

56 of 106

Theorem 1 (Informal Statement)

  • Consider a matrix M of size n1 by n2 which is the sum of a “sufficiently low-rank” component L and a “sufficiently sparse” component S whose support is uniformly randomly distributed in the entries of M.
  • Then the solution of the following optimization problem (known as principal component pursuit) yields exact estimates of L and S with “very high” probability:

56

This is a convex optimization problem.

57 of 106

Be careful!

  • There is an implicit assumption in here that the low-rank component is not sparse (e.g.: a matrix containing zeros everywhere except the right top corner won’t do!).
  • And that the sparse component is not low-rank (that’s why we said that it’s support is drawn from a uniform random distribution).

57

58 of 106

Be careful!

  • In addition, the low-rank component L must obey the assumptions A0 and A1 (required in matrix completion theory).

58

Candes et al, Robust Principal Component Analysis?

https://statweb.stanford.edu/~candes/papers/RobustPCA.pdf

59 of 106

Theorem 1 (Formal Statement)

59

Candes et al, Robust Principal Component Analysis?

https://statweb.stanford.edu/~candes/papers/RobustPCA.pdf

60 of 106

Comments on Theorem 1

  • If the low-rank component L has “spread out” singular vectors so that μ is small, then larger and larger ranks for L can be accommodated.
  • We require the support of S to be sparse and uniformly distributed but make no assumption about the magnitude/sign of the non-zero elements in S. But we don’t know the support of S.
  • In the matrix completion problem, we knew which entries in the matrix were “missing”. Here we do not know the support of S. And the entries in S are not “missing” but unknown and possibly grossly corrupted!
  • The probabilistic nature of the theorem is because the support of S is a uniformly randomly drawn subset of [1,2,…,n1] x [1,2,…,n2].
  • Notice that there is no tuning parameter in the objective function.

60

61 of 106

Matrix Recovery with Corrupted as well as missing data

  • In some cases, we may have missing entries in the matrix (with known location) in addition to corrupted entries (whose location and magnitude we DON’T know).
  • In such cases, we solve:

61

62 of 106

Theorem 2 (Formal Statement)

62

Candes et al, Robust Principal Component Analysis?

https://statweb.stanford.edu/~candes/papers/RobustPCA.pdf

63 of 106

Comments on Theorem 2

  • If all entries were observed (i.e. m = n1n2), then this reduces to Theorem 1.
  • If τ = 0, this becomes a pure matrix completion problem (no sparse component), but which is solved using a slightly different objective function (question: what’s that difference?).
  • In the τ = 0 case, the minimization routine is guaranteed to give us a low-rank component as usual, but a “sparse” component that contains all zeros.
  • In this theorem, observe that the guarantees hold only for L, and the recovery of S is not possible, as many entries of S may not be observed.
  • The parameter for balancing between the L and S terms is auto-tuned and part of the mathematical analysis (no external tuning is required for the theorem to hold).

63

64 of 106

Theorem 2.1: Noisy RPCA

  • Under the same conditions on low-rank component L and support of sparse component S as in Theorem 2, and the same conditions on rank(L) and number of non-zero entries in S, the solution (L*,S*) of the following optimization problem Q2 satisfies the error bound

where δ is such that

64

Constant term

Zhou et al, Stable principal component pursuit, ISIT 2012

65 of 106

Numerical Results on RPCA

  • Test matrices created by adding together low rank and sparse matrices.
  • L matrices created from the product of low-rank Gaussian random matrices.
  • S matrices created from uniform random sampling with entries marked 0,+1,-1 with probability 1-ρ, ρ/2, ρ/2.
  • Success declared if relative error of low rank part is less than 0.001.
  • A specific PCP algorithm is used.

65

66 of 106

Numerical Results on RPCA

  • In a second experiment, the sign of the sampled entries in S to be equal to the sign of the corresponding entries in L.
  • In a third experiment, the S part is 0 and only L is estimated from its sampled version (i.e. the low rank matrix completion problem).

66

67 of 106

67

Results

68 of 106

68

69 of 106

Results on background subtraction

  • Experiments on video sequences acquired in a shopping mall.
  • Video size is 176 x 144 x 200, equal to a matrix of size 25344 x 200.
  • Illumination change was minimal.
  • Another experiment performed on a video of size 168 x 120 x 250 (matrix size 20,160 x 250), with severe illumination changes.

69

70 of 106

Results on background subtraction

70

71 of 106

71

72 of 106

Results on shadow and specularity removal

  • Experiments performed on images from the Yale face database.
  • Image size is 192 x 168, images acquired under 58 different lighting conditions.
  • Thus decomposition performed on a matrix of size 32,356 x 58.
  • Each column of this matrix consists of a vectorized image of the same person from the same viewpoint but different lighting conditions.

72

73 of 106

73

74 of 106

Algorithm for Robust PCA

74

75 of 106

Algorithm for Robust PCA

  • Suppose you want to solve:

  • The augmented Lagrangian method (ALM) adopts the following iterative updates:

75

Augmentation term

Lagrangian term

76 of 106

ALM: Some intuition

  • What is the intuition behind the update of the Lagrange parameters {λi}?

  • The problem is:

76

The maximum w.r.t. λ will be ∞ unless the constraint is satisfied. Hence these problems are equivalent.

77 of 106

ALM: Some intuition

  • The problem is:

77

Due to non-smoothness of the max function, the equivalence has little computational benefit. We smooth it by adding another term that penalizes deviations from a prior estimate of the λ parameters.

Maximization w.r.t. λ is now easy

78 of 106

ALM: Some intuition – inequality constraints

78

Maximization w.r.t. λ is now easy

79 of 106

Algorithm for Robust PCA

  • In our case, we seek to optimize (µ > 0):

  • Basic algorithm:

79

Lagrange matrix

Update of S using soft-thresholding

Update of L using singular-value soft-thresholding

80 of 106

80

Alternating Minimization Algorithm for Robust PCA

Choice of µ is important. The algorithm in the paper simply sets it to 0.25 n1 n2/ ||M||1. In practice, it should be set from a range of candidate values based on match between ||M-L”-S”||2 and the noise level where L”, S” are estimates of L,S. This would require running the algorithm for many value of µ.

81 of 106

(Compressive) Low Rank Matrix Recovery

81

82 of 106

Low rank matrix recovery

  • Note: the third set of experiments in the section on matrix completion are not an example of matrix completion.

  • They are an example of a related problem: low rank matrix recovery, defined as:

82

83 of 106

Low rank matrix recovery

83

84 of 106

Low rank matrix recovery: restricted isometry property (RIP) for linear maps

  • We consider A to be a linear map from Rn1xn2 to Rm.
  • For integer r from 1 to min(n1,n2), the restricted isometry constant (RIC) of A of order r is the smallest number δr such that

for all matrices M of rank at the most r.

84

85 of 106

Low rank matrix recovery: Matrix restricted isometry property (RIP)

  • What linear maps obey RIP?
  • The following matrices obey RIP:
  • Zero-mean iid Gaussian random matrices with variance 1/m
  • (+1/-1)/m Bernoulli random matrices

85

86 of 106

Theorem 1 for low-rank matrix recovery

  • If matrix A is such that δ2r (A) < 1, then the equation A vec(M) = y has a unique solution where M is a matrix of rank at the most r.
  • Proof by contradiction:

86

Recht et al, “Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization”

87 of 106

Theorem 2 for low-rank matrix recovery

  • If matrix A ∈ Rm x n1n2 is such that δ5r (A) < 0.1 for r ≥ 1 and M ∈ Rn1 x n2 has rank at most r, then the solution to the following optimization problem is unique:

87

Recht et al, “Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization”

88 of 106

Theorem 2 for low-rank matrix recovery

  • This theorem has extensions to the case where M is approximately but not exactly of rank r.
  • Approximately low rank means only a few singular values are large in value, and the others are zero or close to zero.
  • And also to the case where the measurements in vector y are noisy.
  • The proof of the result closely follows the proofs for compressive sensing recovery.

88

89 of 106

Theorem 3 for low-rank matrix recovery

  • Consider noisy measurements of the form:

  • Let M* be the solution to the following problem:

  • Then if δr(A) < 1/3, we have:

89

The best rank-r approximation to M.

C1 and C2 are increasing functions of δr(A) in the domain [0,1/3].

Cai and Zhang, “Sharp RIP bound for sparse signal and low-rank matrix recovery”

90 of 106

Comments on Theorem 3

  • It handles the case of noisy measurements as well as approximately (as opposed to exactly) low rank M.
  • There is demonstration of a case where a matrix with δr(A) = 1/3 is unable to recover a set of low rank matrices M.

90

91 of 106

Compressive RPCA: Algorithm and an Application

Primarily based on the paper:

Waters et al, “SpaRCS: Recovering Low-Rank and Sparse Matrices

from Compressive Measurements”, NIPS 2011

91

92 of 106

Problem statement

  • Let M be a matrix which is the sum of low rank matrix L and sparse matrix S.
  • We observed compressive measurements of M in the following form:

92

93 of 106

Scenarios

  • M could be a matrix representing a video – each column of M is a vectorized frame from the video.
  • M could also be a matrix representing a hyperspectral image – each column is the vectorized form of a slice at a given wavelength.
  • Robust Matrix completion – a special form of a compressive L+S recovery problem.

93

94 of 106

Objective function: SpaRCS

94

Free parameters

SpaRCS = sparse and low rank decomposition via compressive sampling

95 of 106

SparCS Algorithm

95

Very simple to implement; but requires tuning of K, r parameters; convergence guarantees not established.

96 of 106

SparCS is similar to CoSamp

  • CoSamp is a greedy algorithm for compressive inversion.
  • The aforementioned SparCS algorithm is similar to CoSamp, but applied for compressive RPCA.
  • For your reference, the CoSamp algorithm is given on the next slide.

96

97 of 106

97

This is for compressive measurement vector u = 𝝓x

98 of 106

Results: Phase transition

98

p = #measurements

K = size of the support of S

r = rank of L

99 of 106

Results: Video CS

99

Follows Rice SPC model, independent compressive measurements on each frame of the matrix M representing the video.

100 of 106

Results: Video CS

100

Follows Rice SPC model, independent compressive measurements on each frame of the matrix M representing the video.

101 of 106

Results: Hyperspectral CS

101

Rice SPC model of CS measurements on every spectral band

102 of 106

Results: Robust matrix completion

102

Note that the support of S is a subset of Ω.

103 of 106

Theorem for Compressive PCP

103

Wright et al, “Compressive Principal Component Pursuit”

http://yima.csl.illinois.edu/psfile/CPCP.pdf

Q is obtained from the linear span of different independent N(0,1) matrices with iid entries

This means that the #measurements >= #DOF of (L,S) times log2(m)

104 of 106

Theorem for Compressive PCP

104

Wright et al, “Compressive Principal Component Pursuit”

http://yima.csl.illinois.edu/psfile/CPCP.pdf

Notion of μ-incoherent L0:

Definition of Q:

105 of 106

Comments

  • The probabilistic nature of the theorem is due to randomness in the choice of Q, and also due to the sign of the values of S and the support of S.
  • There is no assumption on the magnitude of the non-zero values in S.
  • There is no randomness in L.
  • #DOF(L) = mr + r + nr which is O(mr) as m > n. (Here m is not the number of measurements).
  • #DOF(S) = ρmn.
  • #required measurements for theoretical guarantees increases with ρ and r (besides m, n of course).
  • Upper bound on r for the theoretical guarantees to hold is inversely proportional to µ.

105

106 of 106

Summary

  • Low rank matrix completion: motivation, key theorems, numerical results
  • Algorithm for low rank matrix completion
  • Robust PCA with algorithm and theoretical guarantees
  • (Compressive) low rank matrix recovery with theoretical guarantees
  • Compressive RPCA with theoretical guarantees and algorithm

106