DATA PREPROCESSING
Datamining
Unit - II
AGGREGATION
AGGREGATION
AGGREGATION
average yearly precipitation has less variability than the average monthly precipitation.
SAMPLING
SAMPLING
SAMPLING
Sampling and Loss of Information
SAMPLING
Determining the Proper Sample Size
SAMPLING
DIMENSIONALITY REDUCTION
DIMENSIONALITY REDUCTION
DIMENSIONALITY REDUCTION
DIMENSIONALITY REDUCTION
FEATURE SUBSET SELECTION
FEATURE SUBSET SELECTION
FEATURE SUBSET SELECTION
FEATURE SUBSET SELECTION
FEATURE SUBSET SELECTION
FEATURE SUBSET SELECTION
FEATURE SUBSET SELECTION
FEATURE CREATION
FEATURE CREATION
FEATURE CREATION
Mapping the Data to a New Space
FEATURE CREATION
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
Example: a categorical variable with 5 values
{awful, poor, OK, good, great}
require three binary variables x1, x2, and x3.
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
where x0 and xn may be +∞ or −∞, respectively,
or
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
UnSupervised Discretization
Original Data
DISCRETIZATION AND BINARIZATION
UnSupervised Discretization
Equal Width Discretization
DISCRETIZATION AND BINARIZATION
UnSupervised Discretization
Equal Frequency Discretization
DISCRETIZATION AND BINARIZATION
UnSupervised Discretization
K-means Clustering (better result)
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
or
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
DISCRETIZATION AND BINARIZATION
Variable Transformation
Variable Transformation
Variable Transformation
Variable Transformation
Variable Transformation