1 of 26

Data Mining_Anoop Chaturvedi

1

Swayam Prabha

Course Title

Multivariate Data Mining- Methods and Applications

Lecture 19

ICA Algorithms and Exploratory Factory Analysis

By

Anoop Chaturvedi

Department of Statistics, University of Allahabad

Prayagraj (India)

Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha

2 of 26

  •  

Data Mining_Anoop Chaturvedi

2

3 of 26

  •  

Data Mining_Anoop Chaturvedi

3

 

4 of 26

  •  

Data Mining_Anoop Chaturvedi

4

5 of 26

  • Single unit iterative algorithm estimates only one weight vector which extracts a single component.
  • Estimation of additional mutually independent components requires repeating the algorithm.
  • Then linearly independent projection vectors are obtained to extract other components.

Data Mining_Anoop Chaturvedi

5

6 of 26

Multiple component extraction

Parallel Algorithm:

  • Single component routine is carried out in parallel for each independent component to be extracted
  • Then a symmetric orthogonalization is carried out on all components simultaneously.

Deflation method extracts independent components sequentially one-by-one.

Parallel method extracts all independent components simultaneously.

Data Mining_Anoop Chaturvedi

6

7 of 26

  •  

Data Mining_Anoop Chaturvedi

7

8 of 26

  •  

Data Mining_Anoop Chaturvedi

8

9 of 26

  •  

Data Mining_Anoop Chaturvedi

9

Nonquadratic functions and their first two derivatives

Density

G(y)

log cosh

Exp

10 of 26

  •  

Data Mining_Anoop Chaturvedi

10

11 of 26

  •  

Data Mining_Anoop Chaturvedi

11

12 of 26

  •  

Data Mining_Anoop Chaturvedi

12

13 of 26

Plots of simulated signals:

Data Mining_Anoop Chaturvedi

13

14 of 26

Mixed Signals

Data Mining_Anoop Chaturvedi

14

15 of 26

Data Mining_Anoop Chaturvedi

15

16 of 26

Data Mining_Anoop Chaturvedi

16

Unmixed Signals using FastICA

17 of 26

Data Mining_Anoop Chaturvedi

17

Original Signals Reconstructed Signals

Reconstruction has done a good job here except that the algorithm cannot recover the exact amplitude of the source.

18 of 26

Exploratory Factor Analysis (EFA):

  • ICA is defined for the noiseless case, where the sources and observations have the linear relation
  • When we apply ICA to real world problem, we cannot avoid the effect of noise and the number of the sources is unknown.
  • To fit the noisy ICA model, exploratory factor analysis (EFA) is frequently used.

Data Mining_Anoop Chaturvedi

18

19 of 26

  •  

Data Mining_Anoop Chaturvedi

19

20 of 26

  •  

Data Mining_Anoop Chaturvedi

20

21 of 26

  •  

Data Mining_Anoop Chaturvedi

21

22 of 26

  •  

Data Mining_Anoop Chaturvedi

22

23 of 26

Example: Texture measurements of a pastry-type food.

Data set: https://openmv.net/info/food-texture

Oil: percentage Oil in the pastry

Density: Product’s density

Crispy: Crispiness measurement on a scale from 7 to 15

Fracture: Angle, in degrees, through which the pastry can be slowly bent before it fractures.

Hardness: A measure of the amount of force required before breakage occurs.

50 rows and 5 columns

factanal() function of R is used

Data Mining_Anoop Chaturvedi

23

24 of 26

  •  

Data Mining_Anoop Chaturvedi

24

25 of 26

  •  

Data Mining_Anoop Chaturvedi

25

26 of 26

Data Mining_Anoop Chaturvedi

26

If two variables have large loadings for the same factor, they have something in common.

Factor 1 accounts for pastry, which is dense and can be bent a lot before it breaks.

Factor 2 accounts for pastry that is crispy and hard to break.

Soft pastry ⇒ factor 1

Hard pastry ⇒ factor 2