1 of 28

Reconhecimento de Padrões (RPD-0041)�

Prof. André E. Lazzaretti

lazzaretti@utfpr.edu.br

https://sites.google.com/site/andrelazzaretti/graduate-courses/reconhecimento-de-padrões-cpgei/cronograma-2025

Lecture 7 - Compressing Data via Dimensionality Reduction

2 of 28

Introduction

3 of 28

Sequential feature selection algorithms

There are two main categories of dimensionality reduction techniques: feature selection and feature extraction.
Via feature selection, we select a subset of the original features, whereas in feature extraction, we derive information from the feature set to construct a new feature subspace.
Sequential feature selection algorithms are a family of greedy search algorithms that are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k<d.

4 of 28

Sequential Backward Selection

SBS sequentially removes features from the full feature subset until the new feature subspace contains the desired number of features:�

5 of 28

Sequential Forward Selection

6 of 28

SBS in Python

Implementation from scratch (code).
kNN with SBS
Other feature selection methods with scikit-learn

7 of 28

Feature Extraction

An alternative approach to feature selection for dimensionality reduction is feature extraction.
Information content of a dataset by transforming it onto a new feature subspace of lower dimensionality than the original one.
Topics:

Principal Component Analysis (PCA) for unsupervised data compression.
Linear Discriminant Analysis (LDA) as a supervised dimensionality�reduction technique for maximizing class separability.
Nonlinear dimensionality reduction via Kernel Principal Component�Analysis (KPCA).

https://lvdmaaten.github.io/publications/papers/TR_Dimensionality_Reduction_Review_2009.pdf

8 of 28

Principal Component Analysis (PCA)

Unsupervised linear transformation;
PCA aims to find the directions of maximum variance in high-dimensional data and projects it onto a new subspace with equal or fewer dimensions than the original one;
Orthogonal axes (principal components) of the new subspace can be interpreted as the directions of maximum variance.

9 of 28

PCA – general idea (math)

We construct a d×k –dimensional transformation matrix W that allows us to map a sample vector x onto a new k–dimensional feature subspace that has fewer dimensions than the original d–dimensional feature space:

10 of 28

PCA – general idea (algorithm)

11 of 28

PCA – general idea (algorithm)

12 of 28

PCA – general idea (algorithm)

13 of 28

Eigendecomposition

14 of 28

Total and explained variance

15 of 28

Feature Transformation

16 of 28

PCA using Python

PCA with wine dataset with step-by-step (codes).

17 of 28

Linear Discriminant Analysis (LDA)

The goal in LDA is to find the feature subspace that optimizes class separability.

18 of 28

Linear Discriminant Analysis (LDA)

20 of 28

LDA: details

Mean:

Within-class scatter matrix:

Individual scatter matrices:

Covariance matrix:

Between-class scatter matrix:

21 of 28

LDA: algorithm

22 of 28

LDA using Python

LDA with wine dataset with step-by-step (codes).

23 of 28

Nonlinear case

24 of 28

Kernel Functions and PCA

We perform a nonlinear mapping via kernel PCA that transforms the data onto a higher-dimensional space.
We then use standard PCA in this higher-dimensional space to project the data back onto a lower-dimensional space where the samples can be separated by a linear classifier.
However, one downside of this approach is that it is computationally very expensive, and this is where we use the kernel trick.

25 of 28

From PCA to kernel PCA

Covariance matrix:

27 of 28

Kernel PCA using Python

Kernel PCA step-by-step (codes).

1 of 28

2 of 28

3 of 28

4 of 28

5 of 28

6 of 28

7 of 28

8 of 28

9 of 28

10 of 28

11 of 28

12 of 28

13 of 28

14 of 28

15 of 28

16 of 28

17 of 28

18 of 28

19 of 28

20 of 28

21 of 28

22 of 28

23 of 28

24 of 28

25 of 28

26 of 28

27 of 28

28 of 28