Part 3: Decision Trees
Supplimentary material
www
ICS320
2
Decision Tree for PlayTennis
ICS320
3
Decision Tree for PlayTennis
ICS320
4
Outlook
Sunny
Overcast
Rain
Humidity
High
Normal
Wind
Strong
Weak
No
Yes
Yes
Yes
No
Decision Tree for PlayTennis
ICS320
5
Outlook
Sunny
Overcast
Rain
Humidity
High
Normal
No
Yes
Each internal node tests an attribute
Each branch corresponds to an
attribute value node
Each leaf node assigns a classification
Decision Tree for PlayTennis
ICS320
6
No
Outlook
Sunny
Overcast
Rain
Humidity
High
Normal
Wind
Strong
Weak
No
Yes
Yes
Yes
No
Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak ?
Decision Tree for Conjunction
ICS320
7
Outlook
Sunny
Overcast
Rain
Wind
Strong
Weak
No
Yes
No
Outlook=Sunny ∧ Wind=Weak
No
Decision Tree for Disjunction
ICS320
8
Outlook
Sunny
Overcast
Rain
Yes
Outlook=Sunny ∨ Wind=Weak
Wind
Strong
Weak
No
Yes
Wind
Strong
Weak
No
Yes
Decision Tree
ICS320
9
Outlook
Sunny
Overcast
Rain
Humidity
High
Normal
Wind
Strong
Weak
No
Yes
Yes
Yes
No
(Outlook=Sunny ∧ Humidity=Normal)
∨ (Outlook=Overcast)
∨ (Outlook=Rain ∧ Wind=Weak)
When to consider Decision Trees
ICS320
10
Top-Down Induction of Decision Trees ID3
ICS320
11
the attribute value of the branch
Which Attribute is ”best”?
ICS320
12
A1=?
True
False
[21+, 5-]
[8+, 30-]
[29+,35-]
A2=?
True
False
[18+, 33-]
[11+, 2-]
[29+,35-]
Entropy
Entropy(S) = -p+ log2 p+ - p- log2 p-
ICS320
13
Entropy
Why?
–log2 p bits to messages having probability p.
(+ or -) of random member of S:
-p+ log2 p+ - p- log2 p-
ICS320
14
Note that: 0Log20 =0
Information Gain
ICS320
15
A1=?
True
False
[21+, 5-]
[8+, 30-]
[29+,35-]
A2=?
True
False
[18+, 33-]
[11+, 2-]
[29+,35-]
Gain(S,A)=Entropy(S) - ∑v∈values(A) |Sv|/|S| Entropy(Sv)
Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64
= 0.99
Information Gain
Entropy([21+,5-]) = 0.71
Entropy([8+,30-]) = 0.74
Gain(S,A1)=Entropy(S)
-26/64*Entropy([21+,5-])
-38/64*Entropy([8+,30-])
=0.27
ICS320
16
A1=?
True
False
[21+, 5-]
[8+, 30-]
[29+,35-]
Entropy([18+,33-]) = 0.94
Entropy([11+,2-]) = 0.62
Gain(S,A2)=Entropy(S)
-51/64*Entropy([18+,33-])
-13/64*Entropy([11+,2-])
=0.12
A2=?
True
False
[18+, 33-]
[11+, 2-]
[29+,35-]
Training Examples
ICS320
17
No
Strong
High
Mild
Rain
D14
Yes
Weak
Normal
Hot
Overcast
D13
Yes
Strong
High
Mild
Overcast
D12
Yes
Strong
Normal
Mild
Sunny
D11
Yes
Strong
Normal
Mild
Rain
D10
Yes
Weak
Normal
Cool
Sunny
D9
No
Weak
High
Mild
Sunny
D8
Yes
Weak
Normal
Cool
Overcast
D7
No
Strong
Normal
Cool
Rain
D6
Yes
Weak
Normal
Cool
Rain
D5
Yes
Weak
High
Mild
Rain
D4
Yes
Weak
High
Hot
Overcast
D3
No
Strong
High
Hot
Sunny
D2
No
Weak
High
Hot
Sunny
D1
Play Tennis
Wind
Humidity
Temp.
Outlook
Day
Selecting the Next Attribute
ICS320
18
Humidity
High
Normal
[3+, 4-]
[6+, 1-]
S=[9+,5-]
E=0.940
Gain(S,Humidity)
=0.940-(7/14)*0.985
– (7/14)*0.592
=0.151
E=0.985
E=0.592
Wind
Weak
Strong
[6+, 2-]
[3+, 3-]
S=[9+,5-]
E=0.940
E=0.811
E=1.0
Gain(S,Wind)
=0.940-(8/14)*0.811
– (6/14)*1.0
=0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
Selecting the Next Attribute
ICS320
19
Outlook
Sunny
Rain
[2+, 3-]
[3+, 2-]
S=[9+,5-]
E=0.940
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
E=0.971
E=0.971
Over
cast
[4+, 0]
E=0.0
Selecting the Next Attribute
ICS320
20
The information gain values for the 4 attributes are:
where S denotes the collection of training examples
Note: 0Log20 =0
ID3 Algorithm
ICS320
21
Outlook
Sunny
Overcast
Rain
Yes
[D1,D2,…,D14]
[9+,5-]
Ssunny=[D1,D2,D8,D9,D11]
[2+,3-]
?
?
[D3,D7,D12,D13]
[4+,0-]
[D4,D5,D6,D10,D14]
[3+,2-]
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
Test for this node
Note: 0Log20 =0
ID3 Algorithm
ICS320
22
Outlook
Sunny
Overcast
Rain
Humidity
High
Normal
Wind
Strong
Weak
No
Yes
Yes
Yes
No
[D3,D7,D12,D13]
[D8,D9,D11]
[D6,D14]
[D1,D2]
[D4,D5,D10]
Occam’s Razor
Why prefer short hypotheses?
Argument in favor:
Argument opposed:
ICS320
23
Overfitting
ICS320
24
Overfitting in Decision Tree Learning
ICS320
25
Avoid Overfitting
How can we avoid overfitting?
Minimize:
size(tree) + size(misclassifications(tree))
ICS320
26
Converting a Tree to Rules
ICS320
27
Outlook
Sunny
Overcast
Rain
Humidity
High
Normal
Wind
Strong
Weak
No
Yes
Yes
Yes
No
R1: If (Outlook=Sunny) ∧ (Humidity=High) Then PlayTennis=No
R2: If (Outlook=Sunny) ∧ (Humidity=Normal) Then PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain) ∧ (Wind=Strong) Then PlayTennis=No
R5: If (Outlook=Rain) ∧ (Wind=Weak) Then PlayTennis=Yes
Continuous Valued Attributes
Create a discrete attribute to test continuous
Where to set the threshold?
ICS320
28
No
Yes
Yes
Yes
No
No
PlayTennis
270C
240C
220C
190C
180C
150C
Temperature
(see paper by [Fayyad, Irani 1993]
Unknown Attribute Values
What if some examples have missing values of A?
Use training example anyway sort through tree
Classify new examples in the same fashion
ICS320
29