Proses Data Mining
Recap: Latihan
2
Proses Data Mining
3
1. Himpunan Data�
(Pemahaman dan Pengolahan Data)
2. Metode Data Mining��(Pilih Metode�Sesuai Karakter Data)
3. Pengetahuan�
(Pola/Model/Rumus/�Tree/Rule/Cluster)
4. Evaluation �
�(Akurasi, AUC, RMSE, Lift Ratio,…)
DATA PRE-PROCESSING
Data Cleaning
Data Integration
Data Reduction
Data Transformation
Estimation
Prediction
Classification
Clustering
Association
1. Himpunan Data (Dataset)
4
Dataset (Himpunan Data)
5
Class/Label/Target
Attribute/Feature
Nominal
Numerik
Record/
Object/
Sample/
Tuple
Data Preparation
Kenapa Persiapan Data??
Why Data Preprocessing?
Why Is Data Preprocessing Important?
Kegiatan data prepocessing
Forms of data preprocessing
Data Cleaning
How to Handle Missing Data?
How to Handle Missing Data?
Age | Income | Team | Gender |
23 | 24,200 | Red Sox | M |
39 | ? | Yankees | F |
45 | 45,390 | ? | F |
Fill missing values using aggregate functions (e.g., average) or probabilistic estimates on global value distribution
E.g., put the average income here, or put the most probable income based on the fact that the person is 39 years old
E.g., put the most frequent team here
Data cleaning : Noisy Data
Metode Binning
Partisi dalam metode binning
Contoh
Smoothing pada partisi binning
Data cleaning : outliers
Integrasi data
Data Trasformation
Data Trasformation : Normalization
Data Transformation: Discretization