Machine Learning in Real World:�The C4.5 Algorithm
Numeric attributes
D | E | T | Class |
X | 1 | 46 | a |
Y | 3 | 65 | b |
X | 2 | 68 | b |
Y | 2 | 69 | b |
X | 1 | 70 | b |
Y | 3 | 71 | a |
X | 2 | 72 | a |
Y | 1 | 74 | a |
X | 3 | 75 | a |
Y | 3 | 75 | a |
X | 1 | 80 | a |
Y | 1 | 81 | b |
X | 2 | 83 | b |
y | 2 | 85 | b |
IBS = info(7,7) = entropy(1/2,1/2) = 1
InfoGain(D) = 0.02
InfoGain(E) = 0.05
InfoGain(T) = ???
InfoGain of a numeric attribute
46 | 65 | 68 | 69 | 70 | 71 | 72 | 74 | 75 | 75 | 80 | 81 | 83 | 85 |
a | b | b | b | b | a | a | a | a | a | a | b | b | b |
55.5
70.5
80.5
entropy(1,0) = 0
entropy(6/13,7/13) = - 6/13 * log(6/13) - 7/13 * log(7/13) = 0.996
T | a | b | P(a) | P(b) |
< 55.5 | 1 | 0 | 1 | 0 |
> 55.5 | 6 | 7 | 6/13 | 7/13 |
T | Entropy | Probability |
< 55.5 | 0 | 1/14 |
> 55.5 | 0.996 | 13/14 |
Info(T, 55.5) = 0 * 1/14 + 0.996 * 13/14 = 0.925
entropy(1/5,4/5) = - 1/5 * log(1/5) - 4/5 * log(4/5) = 0.722
entropy(6/9,3/9) = - 6/9 * log(6/9) - 3/9 * log(3/9) = 0.918
T | a | b | P(a) | P(b) |
< 70.5 | 1 | 4 | 1/5 | 4/5 |
> 70.5 | 6 | 3 | 6/9 | 3/9 |
T | Entropy | Probability |
< 70.5 | 0.722 | 5/14 |
> 70.5 | 0.918 | 9/14 |
Info(T, 70.5) = 0.722 * 5/14 + 0.918 * 9/14 = 0.831
entropy(7/11,4/11) = - 7/11 * log(7/11) - 4/11 * log(4/11) = 0.946
entropy(0,1) =0
T | a | b | P(a) | P(b) |
< 80.5 | 7 | 4 | 7/11 | 4/11 |
> 80.5 | 0 | 3 | 0 | 1 |
T | Entropy | Probability |
< 80.5 | 0.946 | 11/14 |
> 80.5 | 0 | 3/14 |
Info(T, 80.5) = 0.946 * 11/14 + 0* 3/14 = 0.743
IBS = 1
InfoGain(T, 55.5) = 1 – 0.925 = 0.075
InfoGain(T, 70.5) = 1 – 0.831 = 0.169
InfoGain(T, 80.5) = 1 – 0.743 = 0.257
InfoGain of the T attribute
T
< 80.5
> 80.5
…
…
Missing values
Outlook | Temp | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
? | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
Missing values – calculation
Gain("outlook")
Outlook | Yes | No | P(yes) | P(no) | Entropy�(bits) | Probability |
Sunny | 2 | 3 | 2/5 | 3/5 | | |
Overcast | 3 | 0 | 3/3 | 0/3 | | |
Rainy | 3 | 2 | 3/5 | 2/5 | | |
Gain("outlook") = 13/14 * (information before split – information after split)
Entropy:
Sunny: Info([2,3]) = entropy(2/5,3/5) = -2/5 * log(2/5) – 3/5 * log(3/5) = 0.971
0.971
0
0.971
5/13
3/13
5/13
Overcast: Info([3,0]) = entropy(1,0) = -1 * log(1) – 0 * log(0) = 0
Rainy: Info([3,2]) = entropy(3/5,2/5) = -3/5 * log(3/5) – 2/5 * log(2/5) = 0.971
Info([2,3],[3,0],[3,2]) = 0.971 * (5/13) + 0 * (3/13) + 0.971 * (5/13) = 0.747
Gain("outlook") = 13/14 * (0.961– 0.747) = 0.199
IBS = Info([8,5]) = entropy(8/13,5/13) = -5/13 * log(5/13) – 8/13 * log(8/13) = 0.961
Choosing the "best" attribute
Outlook | Temp | Humidity | Windy | Play | Weight |
Sunny | Hot | High | False | No | 1 |
Sunny | Hot | High | True | No | 1 |
Sunny | Mild | High | False | No | 1 |
Sunny | Cool | Normal | False | Yes | 1 |
Sunny | Mild | Normal | True | Yes | 1 |
? | Mild | High | True | Yes | 5/13 |
Outlook | Temp | Humidity | Windy | Play | Weight |
Rainy | Mild | High | False | Yes | 1 |
Rainy | Cool | Normal | False | Yes | 1 |
Rainy | Cool | Normal | True | No | 1 |
Rainy | Mild | Normal | False | Yes | 1 |
Rainy | Mild | High | True | No | 1 |
? | Mild | High | True | Yes | 5/13 |
Outlook
Outlook | Temp | Humidity | Windy | Play | Weight |
Overcast | Hot | High | False | Yes | 1 |
Overcast | Cool | Normal | True | Yes | 1 |
Overcast | Hot | Normal | False | Yes | 1 |
? | Mild | High | True | Yes | 3/13 |
Rainy
Overcast
Sunny
Continue in the Outlook=Sunny branch
Outlook | Temp | Humidity | Windy | Play | Weight |
Sunny | Hot | High | False | No | 1 |
Sunny | Hot | High | True | No | 1 |
Sunny | Mild | High | False | No | 1 |
Sunny | Cool | Normal | False | Yes | 1 |
Sunny | Mild | Normal | True | Yes | 1 |
? | Mild | High | True | Yes | 5/13 ~ 0.4 |
Hum. | Yes | No | P(yes) | P(no) | Entropy�(bits) | Probability |
High | 0.4 | 3 | 0.4/3.4 = 0.118 | 3/3.4 = 0.882 | 0.524 | 3.4 / 5.4 = 0.63 |
Normal | 2 | 0 | 2/2 = 1 | 0/2 = 0 | 0 | 2 / 5.4 = 0.37 |
Info([0.4,3],[2,0]) = 0.524*0.63 + 0*0.37 = 0.33
InfoGain("Humidity") = info([2,3]) - Info([0,3],[2,0]) = 0.99 – 0.33 = 0.66
High: info([0.4,3]) = entropy(0.118, 0.882) = -0.118*log(0.118)-0.882*log(0.882) = 0.524
Normal: info([2,0]) = entropy(1,0) = 0
IBS = Info(3, 2.4) = entropy(0.56,0.44) = -0.56*log(0.56)-0.44*log(0.44) = 0.99
Yes (100%)
Yes(100%)
No(~80%)
Yes(100%)
No(88%)
Classifying examples with missing values
Outlook | Temp | Humidity | Windy | Play |
Sunny | Cool | ? | False | |
P(YES) = 3.4/5.4 * 12% + 2/5.4 * 100% = 44%
?
P(NO) = 3.4/5.4 * 88% + 2/5.4 * 0% = 56%
No
Class distribution in leaves:
4:3 means:
4 examples with class yes� 3 examples with class no
Making rules from a decision tree