JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 53

NEU 314: Mathematical Tools for Neuroscience

Lecture 8: October 4, 2022

Instructor: Sam Nastase

Princeton Neuroscience Institute

Principal component analysis

2 of 53

Review: null space

The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A

The null space of A is the vector space of all vectors v such that:

3 of 53

The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A

1D vector space spanned by v₁

basis for the null space of v₁

Review: null space

4 of 53

The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A

1D vector space spanned by v₁

basis for the null space of v₁

The row space and null space span ℝ^m

Review: null space

5 of 53

Right singular vectors

Singular values

Left singular vectors

Review: singular value decomposition

6 of 53

Review: singular value decomposition

rotate

stretch

rotate

7 of 53

Review: singular value decomposition

8 of 53

Review: singular value decomposition

For orthogonal matrices (like U and V), the transpose and inverse are equal

9 of 53

Review: singular value decomposition

For orthogonal matrices (like U and V), the transpose and inverse are equal

10 of 53

Review: singular value decomposition

For orthogonal matrices (like U and V), the transpose and inverse are equal

11 of 53

Review: singular value decomposition

rotate

stretch

rotate

12 of 53

Review: singular value decomposition

13 of 53

Review: singular value decomposition

?

14 of 53

Review: singular value decomposition

If any singular values s_n = 0, then S^-1 does not exist

If singular values s_n = 0, then A destroys information that cannot be recovered in the inverse

15 of 53

Review: singular value decomposition

If any singular values s_n = 0, then S^-1 does not exist

If singular values s_n = 0, then A destroys information that cannot be recovered in the inverse

We can compute a pseudo- inverse using only the positive singular values

16 of 53

Review: singular value decomposition

If any singular values s_n = 0, then S^-1 does not exist

If singular values s_n = 0, then A destroys information that cannot be recovered in the inverse

If the singular values are very close to zero, the matrix may be practically non-invertible; i.e. ill-conditioned

17 of 53

Review: singular value decomposition

Non-square matrices are not invertible (but we can still compute a pseudo-inverse)

18 of 53

Rank

The rank of a matrix is the number of linearly independent rows or columns

The rank of a matrix is the dimensionality of the vector space spanned by its rows or its columns

19 of 53

Rank

The rank of a matrix is the number of linearly independent rows or columns

The rank of a matrix is the dimensionality of the vector space spanned by its rows or its columns

The rank of a matrix is the number of nonzero singular values

If s₁, …, s_k, > 0 and

s_{k + 1}, …, s_n = 0, then rank = k

20 of 53

Frobenius norm

The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors

The Frobenius norm is the sum of squared elements of A

21 of 53

Frobenius norm

The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors

The Frobenius norm is the sum of squared elements of A

The Frobenius norm is also equal to the sum of squared singular values

22 of 53

Frobenius norm

The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors

The Frobenius norm is also equal to the trace of A^TA

The trace is the sum of diagonal elements

23 of 53

SVD can also be formulated as a sum of outer products

Singular value decomposition

Right singular vectors

Singular values

Left singular vectors

24 of 53

SVD can also be formulated as a sum of outer products

Singular value decomposition

Right singular vectors

Singular values

Left singular vectors

Each of these is a rank 1 matrix!

25 of 53

The best rank-k approximation of A results from truncating the SVD after k terms

Low-rank matrix approximation

Right singular vectors

Singular values

Left singular vectors

26 of 53

Explained variance

We can quantify the “proportion of variance accounted for” by a rank-k approximation of A

27 of 53

Explained variance

We can quantify the “proportion of variance accounted for” by a rank-k approximation of A

sum of squared first k singular values

sum of squared all n singular values

28 of 53

Explained variance

We can quantify the “proportion of variance accounted for” by a rank-k approximation of A

sum of squared first k singular values

sum of squared all n singular values

29 of 53

30 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

31 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

32 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix X^TX = C

33 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix X^TX = C

Eigendecomposition!

34 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix X^TX = C

Eigendecomposition!

V is a matrix (orthogonal) eigenvectors

L is a diagonal matrix of eigenvalues 𝜆_i

C is a symmetric matrix

35 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix X^TX = C

Eigendecomposition!

projects the data onto the principal axes

these are the principal components!

36 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

37 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

38 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

The singular values s_i are the square root of the eigenvalues 𝜆_i

39 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

projects the data onto the principal axes

these are the principal components!

40 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

41 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

dimension 1

dimension 2

Principal component analysis

42 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

dimension 1

dimension 2

1st PC

Principal component analysis

43 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

1st PC

Principal component analysis

44 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

1st PC

2nd PC

Principal component analysis

45 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

1st PC

2nd PC

Principal component analysis

46 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

What is the top singular vector of X^TX?

47 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

What is the top singular vector of X^TX?

48 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

49 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

50 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

51 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

52 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

53 of 53