1 of 27

Data_Mining_Anoop Chaturvedi

1

Swayam Prabha

Course Title

Multivariate Data Mining- Methods and Applications

Lecture 15

Sample PCA and Applications

By

Anoop Chaturvedi

Department of Statistics, University of Allahabad

Prayagraj (India)

Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha

2 of 27

  •  

Data_Mining_Anoop Chaturvedi

2

3 of 27

  •  

Data_Mining_Anoop Chaturvedi

3

4 of 27

  •  

Data_Mining_Anoop Chaturvedi

4

5 of 27

  •  

Data_Mining_Anoop Chaturvedi

5

6 of 27

  •  

Data_Mining_Anoop Chaturvedi

6

7 of 27

  •  

Data_Mining_Anoop Chaturvedi

7

8 of 27

Data_Mining_Anoop Chaturvedi

8

Scree plot for PC of iris data

9 of 27

  •  

Data_Mining_Anoop Chaturvedi

9

10 of 27

Example: Dataset: decathlon2 from the factoextra package of R

Athletes’ performance during two sporting events.

27 individuals (athletes) described by 13 variables (sport disciplines).

A subset of the first 23 active individuals and the first 10 active variables are selected for PCA.

Data_Mining_Anoop Chaturvedi

10

11 of 27

Data_Mining_Anoop Chaturvedi

11

12 of 27

Data_Mining_Anoop Chaturvedi

12

13 of 27

Data_Mining_Anoop Chaturvedi

13

14 of 27

Data_Mining_Anoop Chaturvedi

14

15 of 27

  •  

Data_Mining_Anoop Chaturvedi

15

16 of 27

Data_Mining_Anoop Chaturvedi

16

 

17 of 27

Data_Mining_Anoop Chaturvedi

17

 

18 of 27

Data_Mining_Anoop Chaturvedi

18

Red dashed line⇒ Expected average contribution

19 of 27

  •  

Data_Mining_Anoop Chaturvedi

19

20 of 27

Data_Mining_Anoop Chaturvedi

20

 

21 of 27

PCA results for individuals (athletes): Contributions of individuals to PC1 and PC2

Data_Mining_Anoop Chaturvedi

21

22 of 27

  •  

Data_Mining_Anoop Chaturvedi

22

23 of 27

BOURGUIGNON, Karpov and Clay contribute the most to both dimensions

Data_Mining_Anoop Chaturvedi

23

Average Contribution

24 of 27

Example: PCA on Image Processing

The cumulative effect of the six principal components, adding one PC at a time.

R-packages: “jpeg”, "factoextra“, "gridExtra“, "ggplot2“, "magick“, "imgpalr“

The color photo has three matrices pixel by pixel, each for one component of RGB (Red, Green, Blue) color.

For converting to grayscale, sum up RGB shades and divide by max value to scale up to a maximum of 1.

Data_Mining_Anoop Chaturvedi

24

25 of 27

Run individual PCA on shades, R, G, and B giving an eigenvector of shades.

Data_Mining_Anoop Chaturvedi

25

26 of 27

Data_Mining_Anoop Chaturvedi

26

Original Image

27 of 27

Each color scale (R, G, B) gets its matrix and PCA.

Integrate new shades into the picture.

The image becomes clearer as we increase the number of principal components.

Data_Mining_Anoop Chaturvedi

27