NEU 314: Mathematical Tools for Neuroscience
Lecture 8: October 4, 2022
Instructor: Sam Nastase
Princeton Neuroscience Institute
Principal component analysis
Review: null space
The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A
The null space of A is the vector space of all vectors v such that:
The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A
1D vector space spanned by v1
basis for the null space of v1
Review: null space
The null space of A is the vector space comprising all vectors that are orthogonal to the rows of A
1D vector space spanned by v1
basis for the null space of v1
The row space and null space span ℝm
Review: null space
Right singular vectors
Singular values
Left singular vectors
Review: singular value decomposition
Review: singular value decomposition
rotate
stretch
rotate
Review: singular value decomposition
Review: singular value decomposition
For orthogonal matrices (like U and V), the transpose and inverse are equal
Review: singular value decomposition
For orthogonal matrices (like U and V), the transpose and inverse are equal
Review: singular value decomposition
For orthogonal matrices (like U and V), the transpose and inverse are equal
Review: singular value decomposition
rotate
stretch
rotate
Review: singular value decomposition
Review: singular value decomposition
?
Review: singular value decomposition
If any singular values sn = 0, then S-1 does not exist
If singular values sn = 0, then A destroys information that cannot be recovered in the inverse
Review: singular value decomposition
If any singular values sn = 0, then S-1 does not exist
If singular values sn = 0, then A destroys information that cannot be recovered in the inverse
We can compute a pseudo- inverse using only the positive singular values
Review: singular value decomposition
If any singular values sn = 0, then S-1 does not exist
If singular values sn = 0, then A destroys information that cannot be recovered in the inverse
If the singular values are very close to zero, the matrix may be practically non-invertible; i.e. ill-conditioned
Review: singular value decomposition
Non-square matrices are not invertible (but we can still compute a pseudo-inverse)
Rank
The rank of a matrix is the number of linearly independent rows or columns
The rank of a matrix is the dimensionality of the vector space spanned by its rows or its columns
Rank
The rank of a matrix is the number of linearly independent rows or columns
The rank of a matrix is the dimensionality of the vector space spanned by its rows or its columns
The rank of a matrix is the number of nonzero singular values
If s1, …, sk, > 0 and
sk + 1, …, sn = 0, then rank = k
Frobenius norm
The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors
The Frobenius norm is the sum of squared elements of A
Frobenius norm
The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors
The Frobenius norm is the sum of squared elements of A
The Frobenius norm is also equal to the sum of squared singular values
Frobenius norm
The Frobenius norm is of a matrix is equivalent to the Euclidean norm for vectors
The Frobenius norm is also equal to the trace of ATA
The trace is the sum of diagonal elements
SVD can also be formulated as a sum of outer products
Singular value decomposition
Right singular vectors
Singular values
Left singular vectors
SVD can also be formulated as a sum of outer products
Singular value decomposition
Right singular vectors
Singular values
Left singular vectors
Each of these is a rank 1 matrix!
The best rank-k approximation of A results from truncating the SVD after k terms
Low-rank matrix approximation
Right singular vectors
Singular values
Left singular vectors
Explained variance
We can quantify the “proportion of variance accounted for” by a rank-k approximation of A
Explained variance
We can quantify the “proportion of variance accounted for” by a rank-k approximation of A
sum of squared first k singular values
sum of squared all n singular values
Explained variance
We can quantify the “proportion of variance accounted for” by a rank-k approximation of A
sum of squared first k singular values
sum of squared all n singular values
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
First, mean-center the data; i.e. for each column, subtract that column’s mean
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
First, mean-center the data; i.e. for each column, subtract that column’s mean
Next, compute the d × d matrix XTX = C
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
First, mean-center the data; i.e. for each column, subtract that column’s mean
Next, compute the d × d matrix XTX = C
Eigendecomposition!
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
First, mean-center the data; i.e. for each column, subtract that column’s mean
Next, compute the d × d matrix XTX = C
Eigendecomposition!
V is a matrix (orthogonal) eigenvectors
L is a diagonal matrix of eigenvalues 𝜆i
C is a symmetric matrix
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
First, mean-center the data; i.e. for each column, subtract that column’s mean
Next, compute the d × d matrix XTX = C
Eigendecomposition!
projects the data onto the principal axes
these are the principal components!
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
The singular values si are the square root of the eigenvalues 𝜆i
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
projects the data onto the principal axes
these are the principal components!
Principal component analysis (PCA) is a dimensionality reduction method for interpreting high-dimensional data
n samples
d dimensions
Principal component analysis
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
dimension 1
dimension 2
Principal component analysis
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
dimension 1
dimension 2
1st PC
Principal component analysis
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
1st PC
Principal component analysis
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
1st PC
2nd PC
Principal component analysis
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
1st PC
2nd PC
Principal component analysis
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
Principal component analysis
What is the top singular vector of XTX?
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
Principal component analysis
What is the top singular vector of XTX?
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
Principal component analysis
In practice, we almost always mean-center the data before PCA
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
Principal component analysis
In practice, we almost always mean-center the data before PCA
covariance matrix!
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
Principal component analysis
In practice, we almost always mean-center the data before PCA
covariance matrix!
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
dimension 1
dimension 2
Principal component analysis
In practice, we almost always mean-center the data before PCA
covariance matrix!
PCA effectively fits an ellipsoid to your data where each axis corresponds to a principal component
Singular values correspond to the length of these axes; i.e. “variance” along these axes
Principal component analysis
In practice, we almost always mean-center the data before PCA
covariance matrix!