1 of 30

Dimensionality Reduction

(Slides adapted from John DeNero)

UC Berkeley Data 100 Summer 2019

Sam Lau

Learning goals:

  • Understand the difference between dimensionality and number of columns.
  • View matrix multiplication as linear operations on data.
  • Introduce the SVD and understand its components.

2 of 30

Announcements

  • HW2 due today (July 9)
  • HW3 due Friday (July 12)
  • Midterm next Tues!
    • 9:40am - 11am Tues, July 16 in 105 North Gate
    • Single hand-written two-sided cheat sheet allowed
    • Covers weeks 1-3
  • Midterm review session on Monday in place of lecture
    • Manana and Ishaan will run it
  • Small group tutoring this week: tinyurl.com/d100-tutor-week3

3 of 30

Dimensions

4 of 30

Dimensionality

The number of dimensions of a dataset (dimensionality) is the number of attributes per observation.

A data matrix has one row per observation and one column per attribute.

Some attributes are redundant!

Age (days)

Height (in)

182

28

399

30

725

33

Age (days)

Height (in)

Height (ft)

182

28

2.33

399

30

2.5

725

33

2.75

2 dimensions

Still 2 dimensions

5 of 30

Dimensionality

Dimensionality described by linear algebra:

  • Number of attributes = column count
  • Dimension = rank of the matrix
  • Dimension ≤ # of attributes

Age (days)

Height (in)

182

28

399

30

725

33

Age (days)

Height (in)

Height (ft)

182

28

2.33

399

30

2.5

725

33

2.75

2 dimensions

Still 2 dimensions

6 of 30

Visualizing High-Dimensional Data

Only two-dimensional datasets can be visualized with a scatter diagram.

  • (Hue and facetting can show a few more dimensions)

To discover associations between pairs of attributes: scatter plot matrix.

To discover clusters of similar observations: try reducing to two dimensions.

7 of 30

Visualizing High-Dimensional Data

(Demo)

To discover clusters of similar observations: try reducing to two dimensions.

  • If the rank of data is larger than 2, you will lose info!
  • Goal: retain information that distinguishes observations.
  • First idea: Pick the two variables with high variance.

End goal: Principal component analysis (PCA) picks linear combinations of attributes that together account for the largest possible amount of variance in the data.

8 of 30

Linear Algebra Review

9 of 30

Linear Algebra Intuition: Remember HW1?

$2 for apples, $1 for bananas, $4 for melons

Fruit bowls:

  1. 2 of each fruit
  2. 5 apples and 8 bananas

10 of 30

Matrices as Linear Operations

One view: matrix multiplication encodes multiple linear computations on data.

11 of 30

Matrices as Linear Operations

One view: matrix multiplication encodes multiple linear computations on data.

(m×n)�(2×3)

(m×k)�(2×1)

(n×k)�(3×1)

12 of 30

Matrices as Linear Operations

  1. Why does the middle dimension need to match?
  2. What does it mean to add a new row to the left matrix?
  3. What does it mean to add a new column to the right matrix?

(m×n)�(2×3)

(m×k)�(2×1)

(n×k)�(3×1)

13 of 30

Matrices as Linear Operations

  • Number of “arguments” for computation needs to match
  • Adding a new fruit bowl
  • Adding a new store with different fruit prices

(m×n)�(2×3)

(m×k)�(2×1)

(n×k)�(3×1)

14 of 30

Matrices as Linear Operations

Add row to left matrix is like adding an operation.

Adding a column to right matrix is like adding an observation.

15 of 30

Matrices as Linear Operations

Notice that matrices and vectors just contain numbers.

Because of symmetry, we can also view the left matrix as data and the right matrix as the computation!

What does this mean in the context of the fruit bowls problem?

16 of 30

Matrices Have Many Representations

Takeaway: a matrix can be interpreted many different ways.

  • Interpretation used the most in Data 100:
    • One matrix holds data, the other holds operations
    • Either left or right matrix can be data
  • Another interpretation:
    • One matrix holds coordinates, the other holds a transformation of coordinates
  • See 3blue1brown’s Essence of Linear Algebra videos for more intuition (these are extremely good!)

17 of 30

Matrices in Python

(Demo)

18 of 30

Question: What does this mean?

In this example, the data in left matrix, operators in right matrix. Interpret the resulting matrix.

(You can use Python to perform the multiplication.)

Age (days)

Height (in)

182

28

399

30

725

33

1

0

0

0

1

1/12

x

=

19 of 30

Question: What does this mean?

In this example, the data in left matrix, operators in right matrix. Interpret the resulting matrix.

(You can use Python to perform the multiplication.)

Age (days)

Height (in)

182

28

399

30

725

33

1

0

0

0

1

1/12

x

Age (days)

Height (in)

Height (ft)

182

28

2.33

399

30

2.5

725

33

2.75

=

20 of 30

Question: What’s the Rank?

You measure the width, length, area, and perimeter of 100 rectangular backyards.

What do you expect to be the rank of the resulting data matrix?

If less than 4, what matrices would you multiply to recover the data matrix?

width

length

area

perimeter

20

20

400

80

16

12

192

60

24

12

288

72

...

25

24

600

98

21 of 30

Question: What’s the Rank?

width

length

area

20

20

400

16

12

192

24

12

288

...

25

24

600

width

length

area

perimeter

20

20

400

80

16

12

192

60

24

12

288

72

...

25

24

600

98

1

0

0

2

0

1

0

2

0

0

1

0

x

=

22 of 30

Break!

Fill out Attendance:

http://bit.ly/at-d100

23 of 30

Orthogonality

Two orthogonal vectors:

  • Meet at a right angle
  • Have a dot-product of zero

The length of a vector v is the square root of v • v.

A unit vector has length 1.

[-1.5 3]

[2 1]

(-1.5) • (2) + (3) • (1) = 0

24 of 30

Orthonormality

An orthonormal matrix has columns all of length 1, and all pairs of different columns are orthogonal.

An orthonormal matrix Q has orthonormal inverse QT, which means QQT = QTQ = I.

An orthonormal matrix can be viewed as a rotation of coordinates (or less commonly a reflection).

25 of 30

Diagonal Matrices

A right-multiplied diagonal matrix Σ scales each column up or down.

Can be viewed as scaling coordinate vectors:

26 of 30

Singular Value Decomposition

Any m x n matrix with rank r can be written as:

Where U and V are orthonormal and Σ is diagonal! Splitting X in this way is called the singular value decomposition (SVD).

m x n

m x r

=

r x n

x

X

U

VT

Σ

x

r x r

27 of 30

Singular Value Decomposition

What does X = UΣVT mean?�U and V orthonormal = rotations. Σ diagonal = scaling.

So, the SVD says that any matrix can be decomposed into a rotation, then a scaling, then another rotation:

28 of 30

Singular Value Decomposition

We can rewrite X = UΣVT as XV = UΣ (right multiply by V)�Note the dimensions of the terms in this form:

XV = UΣ

(m x n) (n x r) = (m x r) (r x r)

(m x r) = (m x r)

Remember that n = columns in X; r = dimensionality of X

If r < n, then we’ve gotten rid of the “unnecessary” columns of X.

29 of 30

Singular Value Decomposition

(Demo)

We can rewrite X = UΣVT as:

XV = UΣ

Read this as:

  • “Rotate points in X so that they’re axis-aligned”, or...
  • “Scale directions in U so that they match magnitudes in X”

30 of 30

Summary

  • We can view matrix multiplication as a linear computation on a set of numeric data.
  • To understand high-dimensional data, we can reduce the number of dimensions while keeping as much info as possible.
  • The SVD decomposes a matrix X into a product of three matrices: an orthonormal, a diagonal, and an orthonormal.
    • This decomposition gives us a projection for X that has fewer dimensions if the rank of X is small.