Dimensionality Reduction
(Slides adapted from John DeNero)
UC Berkeley Data 100 Summer 2019
Sam Lau
Learning goals:
Announcements
Dimensions
Dimensionality
The number of dimensions of a dataset (dimensionality) is the number of attributes per observation.
A data matrix has one row per observation and one column per attribute.
Some attributes are redundant!
Age (days) | Height (in) |
182 | 28 |
399 | 30 |
725 | 33 |
Age (days) | Height (in) | Height (ft) |
182 | 28 | 2.33 |
399 | 30 | 2.5 |
725 | 33 | 2.75 |
2 dimensions
Still 2 dimensions
Dimensionality
Dimensionality described by linear algebra:
Age (days) | Height (in) |
182 | 28 |
399 | 30 |
725 | 33 |
Age (days) | Height (in) | Height (ft) |
182 | 28 | 2.33 |
399 | 30 | 2.5 |
725 | 33 | 2.75 |
2 dimensions
Still 2 dimensions
Visualizing High-Dimensional Data
Only two-dimensional datasets can be visualized with a scatter diagram.
To discover associations between pairs of attributes: scatter plot matrix.
To discover clusters of similar observations: try reducing to two dimensions.
Visualizing High-Dimensional Data
(Demo)
To discover clusters of similar observations: try reducing to two dimensions.
End goal: Principal component analysis (PCA) picks linear combinations of attributes that together account for the largest possible amount of variance in the data.
Linear Algebra Review
Linear Algebra Intuition: Remember HW1?
$2 for apples, $1 for bananas, $4 for melons
Fruit bowls:
Matrices as Linear Operations
One view: matrix multiplication encodes multiple linear computations on data.
Matrices as Linear Operations
One view: matrix multiplication encodes multiple linear computations on data.
(m×n)�(2×3)
(m×k)�(2×1)
(n×k)�(3×1)
Matrices as Linear Operations
(m×n)�(2×3)
(m×k)�(2×1)
(n×k)�(3×1)
Matrices as Linear Operations
(m×n)�(2×3)
(m×k)�(2×1)
(n×k)�(3×1)
Matrices as Linear Operations
Add row to left matrix is like adding an operation.
Adding a column to right matrix is like adding an observation.
Matrices as Linear Operations
Notice that matrices and vectors just contain numbers.
Because of symmetry, we can also view the left matrix as data and the right matrix as the computation!
What does this mean in the context of the fruit bowls problem?
Matrices Have Many Representations
Takeaway: a matrix can be interpreted many different ways.
Matrices in Python
(Demo)
Question: What does this mean?
In this example, the data in left matrix, operators in right matrix. Interpret the resulting matrix.
(You can use Python to perform the multiplication.)
Age (days) | Height (in) |
182 | 28 |
399 | 30 |
725 | 33 |
1 | 0 | 0 |
0 | 1 | 1/12 |
x
=
Question: What does this mean?
In this example, the data in left matrix, operators in right matrix. Interpret the resulting matrix.
(You can use Python to perform the multiplication.)
Age (days) | Height (in) |
182 | 28 |
399 | 30 |
725 | 33 |
1 | 0 | 0 |
0 | 1 | 1/12 |
x
Age (days) | Height (in) | Height (ft) |
182 | 28 | 2.33 |
399 | 30 | 2.5 |
725 | 33 | 2.75 |
=
Question: What’s the Rank?
You measure the width, length, area, and perimeter of 100 rectangular backyards.
What do you expect to be the rank of the resulting data matrix?
If less than 4, what matrices would you multiply to recover the data matrix?
width | length | area | perimeter |
20 | 20 | 400 | 80 |
16 | 12 | 192 | 60 |
24 | 12 | 288 | 72 |
...
25 | 24 | 600 | 98 |
Question: What’s the Rank?
width | length | area |
20 | 20 | 400 |
16 | 12 | 192 |
24 | 12 | 288 |
...
25 | 24 | 600 |
width | length | area | perimeter |
20 | 20 | 400 | 80 |
16 | 12 | 192 | 60 |
24 | 12 | 288 | 72 |
...
25 | 24 | 600 | 98 |
1 | 0 | 0 | 2 |
0 | 1 | 0 | 2 |
0 | 0 | 1 | 0 |
x
=
Orthogonality
Two orthogonal vectors:
The length of a vector v is the square root of v • v.
A unit vector has length 1.
[-1.5 3]
[2 1]
(-1.5) • (2) + (3) • (1) = 0
Orthonormality
An orthonormal matrix has columns all of length 1, and all pairs of different columns are orthogonal.
An orthonormal matrix Q has orthonormal inverse QT, which means QQT = QTQ = I.
An orthonormal matrix can be viewed as a rotation of coordinates (or less commonly a reflection).
Diagonal Matrices
A right-multiplied diagonal matrix Σ scales each column up or down.
Can be viewed as scaling coordinate vectors:
Singular Value Decomposition
Any m x n matrix with rank r can be written as:
Where U and V are orthonormal and Σ is diagonal! Splitting X in this way is called the singular value decomposition (SVD).
m x n
m x r
=
r x n
x
X
U
VT
Σ
x
r x r
Singular Value Decomposition
What does X = UΣVT mean?�U and V orthonormal = rotations. Σ diagonal = scaling.
So, the SVD says that any matrix can be decomposed into a rotation, then a scaling, then another rotation:
Singular Value Decomposition
We can rewrite X = UΣVT as XV = UΣ (right multiply by V)�Note the dimensions of the terms in this form:
XV = UΣ
(m x n) (n x r) = (m x r) (r x r)
(m x r) = (m x r)
Remember that n = columns in X; r = dimensionality of X
If r < n, then we’ve gotten rid of the “unnecessary” columns of X.
Singular Value Decomposition
(Demo)
We can rewrite X = UΣVT as:
XV = UΣ
Read this as:
Summary