1 of 28

Reconhecimento de Padrões (RPD-0041)�

Lecture 7 - Compressing Data via Dimensionality Reduction

2 of 28

Introduction

3 of 28

Sequential feature selection algorithms

  • There are two main categories of dimensionality reduction techniques: feature selection and feature extraction.
  • Via feature selection, we select a subset of the original features, whereas in feature extraction, we derive information from the feature set to construct a new feature subspace.
  • Sequential feature selection algorithms are a family of greedy search algorithms that are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k<d.

4 of 28

Sequential Backward Selection

  • SBS sequentially removes features from the full feature subset until the new feature subspace contains the desired number of features:�

5 of 28

Sequential Forward Selection

6 of 28

SBS in Python

  • Implementation from scratch (code).
  • kNN with SBS
  • Other feature selection methods with scikit-learn

7 of 28

Feature Extraction

  • An alternative approach to feature selection for dimensionality reduction is feature extraction.
  • Information content of a dataset by transforming it onto a new feature subspace of lower dimensionality than the original one.
  • Topics:
    • Principal Component Analysis (PCA) for unsupervised data compression.
    • Linear Discriminant Analysis (LDA) as a supervised dimensionality�reduction technique for maximizing class separability.
    • Nonlinear dimensionality reduction via Kernel Principal Component�Analysis (KPCA).

8 of 28

Principal Component Analysis (PCA)

  • Unsupervised linear transformation;
  • PCA aims to find the directions of maximum variance in high-dimensional data and projects it onto a new subspace with equal or fewer dimensions than the original one;
  • Orthogonal axes (principal components) of the new subspace can be interpreted as the directions of maximum variance.

9 of 28

PCA – general idea (math)

  • We construct a d×k –dimensional transformation matrix W that allows us to map a sample vector x onto a new k–dimensional feature subspace that has fewer dimensions than the original d–dimensional feature space:

10 of 28

PCA – general idea (algorithm)

11 of 28

PCA – general idea (algorithm)

12 of 28

PCA – general idea (algorithm)

13 of 28

Eigendecomposition

14 of 28

Total and explained variance

15 of 28

Feature Transformation

16 of 28

PCA using Python

  • PCA with wine dataset with step-by-step (codes).

17 of 28

Linear Discriminant Analysis (LDA)

  • The goal in LDA is to find the feature subspace that optimizes class separability.

18 of 28

Linear Discriminant Analysis (LDA)

19 of 28

20 of 28

LDA: details

  • Mean:

  • Within-class scatter matrix:

  • Individual scatter matrices:

  • Covariance matrix:

  • Between-class scatter matrix:

21 of 28

LDA: algorithm

22 of 28

LDA using Python

  • LDA with wine dataset with step-by-step (codes).

23 of 28

Nonlinear case

24 of 28

Kernel Functions and PCA

  • We perform a nonlinear mapping via kernel PCA that transforms the data onto a higher-dimensional space.
  • We then use standard PCA in this higher-dimensional space to project the data back onto a lower-dimensional space where the samples can be separated by a linear classifier.
  • However, one downside of this approach is that it is computationally very expensive, and this is where we use the kernel trick.

25 of 28

From PCA to kernel PCA

  • Covariance matrix:

26 of 28

Kernel PCA

27 of 28

Kernel PCA using Python

  • Kernel PCA step-by-step (codes).

28 of 28

Visualizing data with t-SNE