2 of 39

Recall: Ordination methods

In PCA, the axes (PRINCIPAL COMPONENTS) are chosen based on the direction of the data with the greatest variance (therefore explaining the most variance possible using a simplified number of dimensions).

3 of 39

Cartesian Coordinate System

...but we can define it however we want to.

4 of 39

We can redefine our primary axes.

6 of 39

How do we describe our data in this new system?

Eigenvectors and eigenvalues are paired information:

An eigenvector is the vector used to describe the new coordinate system (i.e., axes) in PCA, and is a linear* combination of the original variables

The eigenvalue is a measure of how much of the total variance in the multivariate data is explained by an eigenvector

*Remember this...

7 of 39

Example: Let’s say instead of having two variables (x, y), we originally have 3 (age, hours watching TV, hours studying). For conceptual understanding, we’re going to say that these observations miraculously fall in the general shape of an ellipsoid (pancake-ish). Each point indicates an observation for a single person.

8 of 39

PCA: New axes are created (linear combinations of the original variables) such that the first (PC1) is the direction accounting for the most variance in multivariate data, the second (PC2) accounts for the second most (after PC1 taken into consideration), etc.

If PC1 and PC2 explain most of the variance in data (see eigenvalues), then we’d still be seeing most of the important things about our data if we just view on PC1 and PC2...

9 of 39

We’ve gone from 3 dimensions to 2 dimensions that explain the greatest possible amount of variance. It doesn’t show us everything, but it does show us a lot about the data in just 2 dimensions…

10 of 39

What did we just do?

Dimensionality Reduction

Converting complex multidimensional data into fewer dimensions to explain as much about the data as simply as possible

OK, so that doesn’t seem that cool going from 3 → 2 dimensions...but what if we could go from 15 → 2 dimensions and still describe 80% of variance in the data? Then that becomes pretty cool.

11 of 39

Simplified data, loaded as .csv ‘Patients.csv’

Using these data, how many principal components will we get?

12 of 39

Sure you can do it by hand, but…

13 of 39

prcomp() function in R:

For dataset ‘Patients.csv’:

What does this scaling term do?

14 of 39

Scaling data before PCA: You don’t have to do it, but it’s usually a good idea (advisable).

When you have variables that are on different SCALES in terms of values (rescales so that ALL follow z-distribution)
If you have variables of different UNITS, scaling can be useful

WHEN?

WHY?

Because if you have original data with very different variances, those with much larger variances will disproportionately load the PCAs (this can be units dependent…e.g. changing from meters to millimeters)

15 of 39

What R gives us:

16 of 39

Standard deviations for new PCs. Higher SD = more variance explained in PC.

Remember how we said that the new components (PCs) are linear combinations of the original components? That’s what these give us – eigenvalues for those linear relationships.

THESE ARE THE EIGENVALUES! Also called “loadings” – correlation between each variable and the component

17 of 39

But how much of the variance do the PCs actually explain?

These tell us how much of the total variance is extracted by EACH PC (note order)

These tell us the cumulative amount of variance in data as you increase from the first PC (PC1) to the final PC (here, PC5)

So what do we choose for the ‘cut-off’ for the proportion of variance beyond which we say that additional components aren’t so helpful (i.e., how do we know how many PCs to retain)?

18 of 39

There aren’t really rules about where the cut-off should be. Some say 80% is pretty good, some say you need to look at the cumulative proportions, some say you need to look at the eigenvalues…

It’s really a judgment call.

Generally, the eigenvalues fall off quickly and the cumulative proportions increase quickly (especially useful for large numbers of initial variables):

# Principal Component

Cumulative Proportion of Variance

Eigenvalues

19 of 39

So let’s say we pick the first 3 components to stick with, since we’ve decided that they explain an acceptable amount of the total variance :

What can we learn based on our truncated “model”?

Yes, in this case you might say “This hardly seems worth it to decrease my dimensions from 5 to 3”…but in some cases you’ll have 50 variables and this can allow you to reduce it to just a few!

20 of 39

A scree-plot is useful for visualizing PC contributions

From: NYC Data Science Academy Higgs Boson Machine Learning Challenge https://nycdatascience.com/blog/student-works/secretepipeline-higgs-boson-machine-learning-challenge/

21 of 39

We can also visualize contributions of the different initial variables to the PCs

STHDA Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization

http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization#visualizing-dimension-reduction-analysis-outputs

22 of 39

If the point of ordination methods/dimensional reduction is to simplify our understanding of multivariate relationships, then there should also be a way to visualize that simplified information.

BIPLOTS: an approximation of original multidimensional space, reduced to 2 dimensions with information about variables (as vectors) and observations (as points)

23 of 39

BIPLOT EXAMPLE

24 of 39

A biplot for pca shows two things:

The loading (eigenvalues) of variables for the first two principal components (red arrows)

The score of each case (observations) based on the first two principal components (numbered points)

25 of 39

Interpreting biplot outputs: lines (variables)

The length of a line (or arrow) indicates variance (longer length = larger variance)

The angle between lines (or arrows) indicates strength of correlation

0° difference = correlation of 1

180° difference = correlation of -1

90° or 270° difference = correlation of 0

26 of 39

INTERPRETING BIPLOT OUTPUTS: POINTS (observations)

The closer points are to each other, the smaller their Euclidian Distance – meaning they are more similar overall in multivariate space

Biplots can make it easier to see multivariate groups and outliers

27 of 39

In PC1 direction: SBP, Height, Weight, Cholesterol vary similarly (as one increases, the others increase)

Height, weight and cholesterol are minimally correlated with Age.

Not observable clusters (no grouping done)

PC1 Direction

PC2 Direction

29 of 39

Biplots (and dimensional reduction in general):

Help us to assess grouping/clustering of multivariate observations in a simpler way (2-dimensions)

Visually explore data for interesting relationships, correlations between variables

It’s not a hypothesis test or a “model” – but that doesn’t make it any less valuable!

Yes, we sacrifice information - but it can be worth the trade-off for simplification and visualization

30 of 39

What else is there?

Mohammad Ali Zare Chahouki (2012) Classification and Ordination Methods as a Tool for Analyzing of Plant Communities, Intech Open (online).

31 of 39

PCA: No specification of explanatory variables and outcome variables...it’s just variables

What if we have a scenario where we have explanatory variables and outcome variables?

32 of 39

Cluster Analysis

Find similar groups of values/families

Unconstrained Ordination (PCA, nMDS, etc.)

Find maximum variance components for variables, distance-based methods

Constrained Ordination (RDA, CCA, etc.)

Find maximum variance components for dependent variables, explained by predictor variables

Discrimination Methods (MANOVA, etc.)

Test for significant differences in groups

Multivariate Approaches

33 of 39

Redundancy Analysis:

An extension of MLR with more that one DV

> 2 IVs, >2 DVs

Same idea as PCA (constrained because principal components are combinations of IVs)

Similar interpretation, but tells us about correlations between IVs and DVs

34 of 39

An example of RDA: Exploring leaf litter decomposition rates

Environmental (Independent) Variables

Dependent (Outcome) Variables

Temp

C:N

Etc.

DLV

C % Increase

Site 1

Site 2

Site 3

Site n

GENERAL DATA STRUCTURE:

36 of 39

“Redundancy analysis and Pearson correlations also revealed that leaf litter decomposition (k) varied across the sites according to climatic factors (Figure 7). Specifically, it was positively correlated to growing season length (GSL), degree-days (DD) and growing season average air temperature (T_air). Additionally, leaf litter decomposition was related to moisture (negatively) and temperature (positively) in the topsoil (T_soil)…Similarly, willow leaf litter k and A were positively correlated to leaf litter N concentration (N) and negatively correlated to leaf litter C:N ratio (C:N)…”

37 of 39

Thanks to Sebastian Tapia for this example!

“We used a redundancy analysis to explore whether certain types of responses were related to the fishers’ socioeconomic characteristics. Fishers that would employ amplifying responses had greater economic wealth but lacked options. Fishers who would adopt dampening responses possessed characteristics associated with having livelihood options. Fishers who would adopt neither amplifying nor dampening responses were less likely to belong to community groups and sold the largest proportion of their catch.”

38 of 39

“Fishers that would employ amplifying responses had greater economic wealth but lacked options. Fishers who would adopt dampening responses possessed characteristics associated with having livelihood options. Fishers who would adopt neither amplifying nor dampening responses were less likely to belong to community groups and sold the largest proportion of their catch.”

1 of 39