1 of 30

Dimensionality Reduction

(Slides adapted from John DeNero)

UC Berkeley Data 100 Summer 2019

Sam Lau

Learning goals:

Understand the difference between dimensionality and number of columns.
View matrix multiplication as linear operations on data.
Introduce the SVD and understand its components.

2 of 30

Announcements

HW2 due today (July 9)
HW3 due Friday (July 12)
Midterm next Tues!

9:40am - 11am Tues, July 16 in 105 North Gate
Single hand-written two-sided cheat sheet allowed
Covers weeks 1-3

Midterm review session on Monday in place of lecture

Manana and Ishaan will run it

Small group tutoring this week: tinyurl.com/d100-tutor-week3

3 of 30

Dimensions

4 of 30

Dimensionality

The number of dimensions of a dataset (dimensionality) is the number of attributes per observation.

A data matrix has one row per observation and one column per attribute.

Some attributes are redundant!

Age (days)	Height (in)
182	28
399	30
725	33

Age (days)	Height (in)	Height (ft)
182	28	2.33
399	30	2.5
725	33	2.75

2 dimensions

Still 2 dimensions

5 of 30

Dimensionality

Dimensionality described by linear algebra:

Number of attributes = column count
Dimension = rank of the matrix
Dimension ≤ # of attributes

Age (days)	Height (in)
182	28
399	30
725	33

Age (days)	Height (in)	Height (ft)
182	28	2.33
399	30	2.5
725	33	2.75

2 dimensions

Still 2 dimensions

6 of 30

Visualizing High-Dimensional Data

Only two-dimensional datasets can be visualized with a scatter diagram.

(Hue and facetting can show a few more dimensions)

To discover associations between pairs of attributes: scatter plot matrix.

To discover clusters of similar observations: try reducing to two dimensions.

7 of 30

Visualizing High-Dimensional Data

(Demo)

To discover clusters of similar observations: try reducing to two dimensions.

If the rank of data is larger than 2, you will lose info!
Goal: retain information that distinguishes observations.
First idea: Pick the two variables with high variance.

End goal: Principal component analysis (PCA) picks linear combinations of attributes that together account for the largest possible amount of variance in the data.

8 of 30

Linear Algebra Review

9 of 30

Linear Algebra Intuition: Remember HW1?

$2 for apples, $1 for bananas, $4 for melons

Fruit bowls:

2 of each fruit
5 apples and 8 bananas

10 of 30

Matrices as Linear Operations

One view: matrix multiplication encodes multiple linear computations on data.

11 of 30

Matrices as Linear Operations

One view: matrix multiplication encodes multiple linear computations on data.

(m×n)�(2×3)

(m×k)�(2×1)

(n×k)�(3×1)

12 of 30

Matrices as Linear Operations

Why does the middle dimension need to match?
What does it mean to add a new row to the left matrix?
What does it mean to add a new column to the right matrix?

(m×n)�(2×3)

(m×k)�(2×1)

(n×k)�(3×1)

13 of 30

Matrices as Linear Operations

Number of “arguments” for computation needs to match
Adding a new fruit bowl
Adding a new store with different fruit prices

(m×n)�(2×3)

(m×k)�(2×1)

(n×k)�(3×1)

14 of 30

Matrices as Linear Operations

Add row to left matrix is like adding an operation.

Adding a column to right matrix is like adding an observation.

https://betterexplained.com/articles/matrix-multiplication/

15 of 30

Matrices as Linear Operations

Notice that matrices and vectors just contain numbers.

Because of symmetry, we can also view the left matrix as data and the right matrix as the computation!

What does this mean in the context of the fruit bowls problem?

16 of 30

Matrices Have Many Representations

Takeaway: a matrix can be interpreted many different ways.

Interpretation used the most in Data 100:

One matrix holds data, the other holds operations
Either left or right matrix can be data

Another interpretation:

One matrix holds coordinates, the other holds a transformation of coordinates

See 3blue1brown’s Essence of Linear Algebra videos for more intuition (these are extremely good!)

17 of 30

Matrices in Python

(Demo)

18 of 30

Question: What does this mean?

In this example, the data in left matrix, operators in right matrix. Interpret the resulting matrix.

(You can use Python to perform the multiplication.)

Age (days)	Height (in)
182	28
399	30
725	33

1	0	0
0	1	1/12

x

=

19 of 30

Question: What does this mean?

In this example, the data in left matrix, operators in right matrix. Interpret the resulting matrix.

(You can use Python to perform the multiplication.)

Age (days)	Height (in)
182	28
399	30
725	33

1	0	0
0	1	1/12

x

Age (days)	Height (in)	Height (ft)
182	28	2.33
399	30	2.5
725	33	2.75

=

20 of 30

Question: What’s the Rank?

You measure the width, length, area, and perimeter of 100 rectangular backyards.

What do you expect to be the rank of the resulting data matrix?

If less than 4, what matrices would you multiply to recover the data matrix?

width	length	area	perimeter
20	20	400	80
16	12	192	60
24	12	288	72

...

25	24	600	98

21 of 30

Question: What’s the Rank?

width	length	area
20	20	400
16	12	192
24	12	288

...

25	24	600

width	length	area	perimeter
20	20	400	80
16	12	192	60
24	12	288	72

...

25	24	600	98

1	0	0	2
0	1	0	2
0	0	1	0

x

=

22 of 30

Break!

Fill out Attendance:

http://bit.ly/at-d100

23 of 30

Orthogonality

Two orthogonal vectors:

Meet at a right angle
Have a dot-product of zero

The length of a vector v is the square root of v • v.

A unit vector has length 1.

[-1.5 3]

[2 1]

(-1.5) • (2) + (3) • (1) = 0

24 of 30

Orthonormality

An orthonormal matrix has columns all of length 1, and all pairs of different columns are orthogonal.

An orthonormal matrix Q has orthonormal inverse Q^T, which means QQ^T = Q^TQ = I.

An orthonormal matrix can be viewed as a rotation of coordinates (or less commonly a reflection).

25 of 30

Diagonal Matrices

A right-multiplied diagonal matrix Σ scales each column up or down.

Can be viewed as scaling coordinate vectors:

26 of 30

Singular Value Decomposition

Any m x n matrix with rank r can be written as:

Where U and V are orthonormal and Σ is diagonal! Splitting X in this way is called the singular value decomposition (SVD).

m x n

m x r

=

r x n

x

X

U

V^T

Σ

x

r x r

27 of 30

Singular Value Decomposition

What does X = UΣV^T mean?�U and V orthonormal = rotations. Σ diagonal = scaling.

So, the SVD says that any matrix can be decomposed into a rotation, then a scaling, then another rotation:

28 of 30

Singular Value Decomposition

We can rewrite X = UΣV^T as XV = UΣ (right multiply by V)�Note the dimensions of the terms in this form:

XV = UΣ

(m x n) (n x r) = (m x r) (r x r)

(m x r) = (m x r)

Remember that n = columns in X; r = dimensionality of X

If r < n, then we’ve gotten rid of the “unnecessary” columns of X.

29 of 30

Singular Value Decomposition

(Demo)

We can rewrite X = UΣV^T as:

XV = UΣ

Read this as:

“Rotate points in X so that they’re axis-aligned”, or...
“Scale directions in U so that they match magnitudes in X”

30 of 30

Summary

We can view matrix multiplication as a linear computation on a set of numeric data.
To understand high-dimensional data, we can reduce the number of dimensions while keeping as much info as possible.
The SVD decomposes a matrix X into a product of three matrices: an orthonormal, a diagonal, and an orthonormal.

This decomposition gives us a projection for X that has fewer dimensions if the rank of X is small.