1 of 53

NEU 314: Mathematical Tools for Neuroscience

Lecture 8: October 4, 2022

Instructor: Sam Nastase

Princeton Neuroscience Institute

Principal component analysis

2 of 53

Review: null space

The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A

The null space of A is the vector space of all vectors v such that:

3 of 53

The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A

1D vector space spanned by v1

basis for the null space of v1

Review: null space

4 of 53

The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A

1D vector space spanned by v1

basis for the null space of v1

The row space and null space spanm

Review: null space

5 of 53

Right singular vectors

Singular values

Left singular vectors

Review: singular value decomposition

6 of 53

Review: singular value decomposition

rotate

stretch

rotate

7 of 53

Review: singular value decomposition

8 of 53

Review: singular value decomposition

For orthogonal matrices (like U and V), the transpose and inverse are equal

9 of 53

Review: singular value decomposition

For orthogonal matrices (like U and V), the transpose and inverse are equal

10 of 53

Review: singular value decomposition

For orthogonal matrices (like U and V), the transpose and inverse are equal

11 of 53

Review: singular value decomposition

rotate

stretch

rotate

12 of 53

Review: singular value decomposition

13 of 53

Review: singular value decomposition

?

14 of 53

Review: singular value decomposition

If any singular values sn = 0, then S-1 does not exist

If singular values sn = 0, then A destroys information that cannot be recovered in the inverse

15 of 53

Review: singular value decomposition

If any singular values sn = 0, then S-1 does not exist

If singular values sn = 0, then A destroys information that cannot be recovered in the inverse

We can compute a pseudo- inverse using only the positive singular values

16 of 53

Review: singular value decomposition

If any singular values sn = 0, then S-1 does not exist

If singular values sn = 0, then A destroys information that cannot be recovered in the inverse

If the singular values are very close to zero, the matrix may be practically non-invertible; i.e. ill-conditioned

17 of 53

Review: singular value decomposition

Non-square matrices are not invertible (but we can still compute a pseudo-inverse)

18 of 53

Rank

The rank of a matrix is the number of linearly independent rows or columns

The rank of a matrix is the dimensionality of the vector space spanned by its rows or its columns

19 of 53

Rank

The rank of a matrix is the number of linearly independent rows or columns

The rank of a matrix is the dimensionality of the vector space spanned by its rows or its columns

The rank of a matrix is the number of nonzero singular values

If s1, …, sk, > 0 and

sk + 1, …, sn = 0, then rank = k

20 of 53

Frobenius norm

The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors

The Frobenius norm is the sum of squared elements of A

21 of 53

Frobenius norm

The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors

The Frobenius norm is the sum of squared elements of A

The Frobenius norm is also equal to the sum of squared singular values

22 of 53

Frobenius norm

The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors

The Frobenius norm is also equal to the trace of ATA

The trace is the sum of diagonal elements

23 of 53

SVD can also be formulated as a sum of outer products

Singular value decomposition

Right singular vectors

Singular values

Left singular vectors

24 of 53

SVD can also be formulated as a sum of outer products

Singular value decomposition

Right singular vectors

Singular values

Left singular vectors

Each of these is a rank 1 matrix!

25 of 53

The best rank-k approximation of A results from truncating the SVD after k terms

Low-rank matrix approximation

Right singular vectors

Singular values

Left singular vectors

26 of 53

Explained variance

We can quantify the “proportion of variance accounted for” by a rank-k approximation of A

27 of 53

Explained variance

We can quantify the “proportion of variance accounted for” by a rank-k approximation of A

sum of squared first k singular values

sum of squared all n singular values

28 of 53

Explained variance

We can quantify the “proportion of variance accounted for” by a rank-k approximation of A

sum of squared first k singular values

sum of squared all n singular values

29 of 53

30 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

31 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

32 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix XTX = C

33 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix XTX = C

Eigendecomposition!

34 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix XTX = C

Eigendecomposition!

V is a matrix (orthogonal) eigenvectors

L is a diagonal matrix of eigenvalues 𝜆i

C is a symmetric matrix

35 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

First, mean-center the data; i.e. for each column, subtract that column’s mean

Next, compute the d × d matrix XTX = C

Eigendecomposition!

projects the data onto the principal axes

these are the principal components!

36 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

37 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

38 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

The singular values si are the square root of the eigenvalues 𝜆i

39 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

projects the data onto the principal axes

these are the principal components!

40 of 53

Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data

n samples

d dimensions

Principal component analysis

41 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

dimension 1

dimension 2

Principal component analysis

42 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

dimension 1

dimension 2

1st PC

Principal component analysis

43 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

1st PC

Principal component analysis

44 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

1st PC

2nd PC

Principal component analysis

45 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

1st PC

2nd PC

Principal component analysis

46 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

What is the top singular vector of XTX?

47 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

What is the top singular vector of XTX?

48 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

49 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

50 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

51 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

dimension 1

dimension 2

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

52 of 53

PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component

Singular values correspond to the length of these axes; i.e. “variance” along these axes

Principal component analysis

In practice, we almost always mean-center the data before PCA

covariance matrix!

53 of 53