1 of 8

Dimensionality Reduction

1

2 of 8

Introduction

1

  • In many learning problems, the datasets have large number of variables.
  • Sometimes, the number of variables is more than the number of observations.
  • For example, in many scientific fields such as
        • image processing,
        • time series analysis,
        • Internet search engines, and
        • automatic text analysis.

3 of 8

Introduction

1

  • Statistical and machine learning methods have some difficulty when dealing with such high-dimensional data.

  • Normally the number of input variables is reduced before the machine learning algorithms can be successfully applied.

In statistical and machine learning, dimensionality reduction or dimension reduction is the process of reducing the number of variables under consideration by obtaining a smaller set of principal variables.

4 of 8

Dimensionality Reduction - Types

1

Dimensionality reduction may be implemented in two ways.

  • Feature selection
  • Feature extraction

 

5 of 8

Dimensionality Reduction - Types

1

Dimensionality reduction may be implemented in two ways.

  • Feature selection
  • Feature extraction

Feature Selection�In feature selection, we are interested in selecting k features out of the total n features that provide the most useful information, and we discard the remaining (n − k) features.

 

6 of 8

Dimensionality Reduction - Types

1

Feature Extraction

  • In feature extraction, we aim to create a new set of k features that are formed by combining the original n features.

  • These techniques can be supervised or unsupervised, depending on whether they use output (label) information or not.

  • The most well-known and widely used feature extraction methods are Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA).

7 of 8

Dimensionality Reduction - Measures of error

1

In both methods we require a measure of the error in the model. In regression problems, we may use the

    • Mean Squared Error (MSE) or the

    • Root Mean Squared Error (RMSE)

8 of 8

Dimensionality Reduction - Measures of error

1