Classification�(Basic Concepts)
1
Classification: Basic Concepts
2
Supervised vs. Unsupervised Learning
3
Prediction Problems: Classification vs. Numeric Prediction
4
Classification—A Two-Step Process
5
Process (1): Model Construction
6
Training
Data
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Process (2): Using the Model in Prediction
7
Classifier
Testing
Data
Unseen Data
(Jeff, Professor, 4)
Tenured?
Chapter 8. Classification: Basic Concepts
8
Decision Tree Induction: An Example
9
age?
overcast
student?
credit rating?
<=30
>40
no
yes
yes
yes
31..40
no
fair
excellent
yes
no
Algorithm for Decision Tree Induction
10
Brief Review of Entropy
11
m = 2
12
Attribute Selection Measure: Information Gain (ID3/C4.5)
Attribute Selection: Information Gain
means “age <=30” has 5 out of 14 samples, with 2 yes’es and 3 no’s. Hence
Similarly,
13
Computing Information-Gain for Continuous-Valued Attributes
14
Gain Ratio for Attribute Selection (C4.5)
15
Gini Index (CART, IBM IntelligentMiner)
where pj is the relative frequency of class j in D
16
Computation of Gini Index
Gini{low,high} is 0.458; Gini{medium,high} is 0.450. Thus, split on the {low,medium} (and {high}) since it has the lowest Gini index
17
Comparing Attribute Selection Measures
18
Other Attribute Selection Measures
19
Overfitting and Tree Pruning
20
Enhancements to Basic Decision Tree Induction
21
Classification in Large Databases
22
Scalability Framework for RainForest
23
Rainforest: Training Set and Its AVC Sets
24
student | Buy_Computer | |
| yes | no |
yes | 6 | 1 |
no | 3 | 4 |
Age | Buy_Computer | |
| yes | no |
<=30 | 2 | 3 |
31..40 | 4 | 0 |
>40 | 3 | 2 |
Credit rating | Buy_Computer | |
yes | no | |
fair | 6 | 2 |
excellent | 3 | 3 |
AVC-set on income
AVC-set on Age
AVC-set on Student
Training Examples
income | Buy_Computer | |
| yes | no |
high | 2 | 2 |
medium | 4 | 2 |
low | 3 | 1 |
AVC-set on
credit_rating
BOAT (Bootstrapped Optimistic Algorithm for Tree Construction)
25
Presentation of Classification Results
*
Data Mining: Concepts and Techniques
26
Visualization of a Decision Tree in SGI/MineSet 3.0
*
Data Mining: Concepts and Techniques
27
Interactive Visual Mining by Perception-Based Classification (PBC)
Data Mining: Concepts and Techniques
28
Chapter 8. Classification: Basic Concepts
29
Bayesian Classification: Why?
30
Bayes’ Theorem: Basics
31
Prediction Based on Bayes’ Theorem
posteriori = likelihood x prior/evidence
32
Classification Is to Derive the Maximum Posteriori
needs to be maximized
33
Naïve Bayes Classifier
and P(xk|Ci) is
34
Naïve Bayes Classifier: Training Dataset
35
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
Naïve Bayes Classifier: An Example
P(buys_computer = “no”) = 5/14= 0.357
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
36
Avoiding the Zero-Probability Problem
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
37
Naïve Bayes Classifier: Comments
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
38
Chapter 8. Classification: Basic Concepts
39
Using IF-THEN Rules for Classification
R: IF age = youth AND student = yes THEN buys_computer = yes
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
40
Rule Extraction from a Decision Tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = no
IF age = old AND credit_rating = fair THEN buys_computer = yes
41
age?
student?
credit rating?
<=30
>40
no
yes
yes
yes
31..40
no
fair
excellent
yes
no
Rule Induction: Sequential Covering Method
42
Sequential Covering Algorithm
while (enough target tuples left)
generate a rule
remove positive target tuples satisfying this rule
43
Examples covered
by Rule 3
Examples covered
by Rule 2
Examples covered
by Rule 1
Positive examples
Rule Generation
while(true)
find the best predicate p
if foil-gain(p) > threshold then add p to current rule
else break
44
Positive examples
Negative examples
A3=1
A3=1&&A1=2
A3=1&&A1=2
&&A8=5
How to Learn-One-Rule?
Pos/neg are # of positive/negative tuples covered by R.
If FOIL_Prune is higher for the pruned version of R, prune R
45
Classification: Basic Concepts
46
Model Evaluation and Selection
47
Classifier Evaluation Metrics: Confusion Matrix
Actual class\Predicted class | buy_computer = yes | buy_computer = no | Total |
buy_computer = yes | 6954 | 46 | 7000 |
buy_computer = no | 412 | 2588 | 3000 |
Total | 7366 | 2634 | 10000 |
Confusion Matrix:
Actual class\Predicted class | C1 | ¬ C1 |
C1 | True Positives (TP) | False Negatives (FN) |
¬ C1 | False Positives (FP) | True Negatives (TN) |
Example of Confusion Matrix:
48
Classifier Evaluation Metrics: Accuracy, Error Rate, Sensitivity and Specificity
Accuracy = (TP + TN)/All
Error rate = (FP + FN)/All
A\P | C | ¬C | |
C | TP | FN | P |
¬C | FP | TN | N |
| P’ | N’ | All |
49
Classifier Evaluation Metrics: �Precision and Recall, and F-measures
50
Classifier Evaluation Metrics: Example
51
Actual Class\Predicted class | cancer = yes | cancer = no | Total | Recognition(%) |
cancer = yes | 90 | 210 | 300 | 30.00 (sensitivity |
cancer = no | 140 | 9560 | 9700 | 98.56 (specificity) |
Total | 230 | 9770 | 10000 | 96.40 (accuracy) |
Evaluating Classifier Accuracy:�Holdout & Cross-Validation Methods
52
Evaluating Classifier Accuracy: Bootstrap
53
Estimating Confidence Intervals:�Classifier Models M1 vs. M2
54
Estimating Confidence Intervals:�Null Hypothesis
55
Estimating Confidence Intervals: t-test
where
and
where
where k1 & k2 are # of cross-validation samples used for M1 & M2, resp.
56
Estimating Confidence Intervals:�Table for t-distribution
57
Estimating Confidence Intervals:�Statistical Significance
58
Model Selection: ROC Curves
59
Issues Affecting Model Selection
60
Chapter 8. Classification: Basic Concepts
61
Ensemble Methods: Increasing the Accuracy
62
Bagging: Boostrap Aggregation
63
Boosting
64
Adaboost (Freund and Schapire, 1997)
65
Random Forest (Breiman 2001)
66
Classification of Class-Imbalanced Data Sets
67
Chapter 8. Classification: Basic Concepts
68
Summary (I)
69
Summary (II)
70
References (1)
71
References (2)
72