Fundamentals of
Dimension Reduction
Exploring high-dimensional data
Zachary del Rosario (He/Him)
1
Workshop Schedule
Extract
Wrangle + Tidy
Friday
Saturday
Visualize
Model
Sunday
Monday
Tabula +
WebPlotDigitizer
Python + Jupyter
Concepts
Execution
Concepts
Execution
Concepts
Fin
Focus
Live
Take-Home
2
Sneak Preview: Alloys Dataset
3
Sneak Preview: Alloys Dataset
26 numeric columns!
325 pairs!
At best: Two axes + shape & color == 4 column in one visual
4
Dimension Reduction to the Rescue!
Dimension reduction (DR) visualizes high-dimensional data
BUT interpreting DR is tricky
SO let’s study some fundamentals
5
Outline
6
Principal Component Analysis (PCA)
Fundamental linear dimension reduction (DR)
7
8
Data-informed
direction
9
Data-informed
direction
Projected points
}
10
Data-informed
direction
Projected points
}
Idea: Find direction of greatest variance in the data
11
Linear Dimension Reduction : PCA
Principal Components Analysis (PCA)
12
Linear Dimension Reduction : PCA
Procedure:
13
PCA Example
26 numeric columns!
325 pairs!
At best: Two axes + shape & color == 4 column in one visual
14
Can Visualize Pairs of Variables….
Somewhat informative…
Not using all information (variables)!
15
PCA : Projection
More informative!
Note the more distinct groups
In particular, easier to see Series 8 clusters
16
PCA : Interpreting Weights
[Text goes here]
17
PCA : Projection
Read Al content
More Al
Less Al
18
PCA : Interpreting Weights
[Text goes here]
19
PCA : Projection
More informative!
Note the more distinct groups
More Al
Less Al
More Zn
Less Cu
Less Zn
More Cu
20
PCA : Projection
More informative!
Note the more distinct groups
More Al
Less Al
More Zn
Less Cu
Less Zn
More Cu
21
Important Caveat
22
Uniform Manifold Approximation (UMAP)
Cutting-edge nonlinear dimension reduction
23
Linear vs Nonlinear DR
24
Uniform Manifold Approximation (UMAP)
Recent (2018) approach to nonlinear dimension reduction
25
UMAP Example
26
UMAP Example
Very distinct clusters!
27
Difficulties
UMAP cluster distances mean nothing!
28
Observations
Series 8 clusters with other alloys
29
UMAP : With Great Power...
30
Tonight’s Exercise
31
Tonight’s Notebook: Visualizing in Python
04_vis_assignment
32
End of Today
33