1 of 14

Machine Learning in Real World:�The C4.5 Algorithm

2 of 14

Numeric attributes

D	E	T	Class
X	1	46	a
Y	3	65	b
X	2	68	b
Y	2	69	b
X	1	70	b
Y	3	71	a
X	2	72	a
Y	1	74	a
X	3	75	a
Y	3	75	a
X	1	80	a
Y	1	81	b
X	2	83	b
y	2	85	b

IBS = info(7,7) = entropy(1/2,1/2) = 1

InfoGain(D) = 0.02

InfoGain(E) = 0.05

InfoGain(T) = ???

3 of 14

InfoGain of a numeric attribute

Similar to class-dependent discretization:

putting a split point between examples of the same class cannot be optimal

Put split points only between examples of different classes.

4 of 14

46	65	68	69	70	71	72	74	75	75	80	81	83	85
a	b	b	b	b	a	a	a	a	a	a	b	b	b

55.5

70.5

80.5

entropy(1,0) = 0

entropy(6/13,7/13) = - 6/13 * log(6/13) - 7/13 * log(7/13) = 0.996

T	a	b	P(a)	P(b)
< 55.5	1	0	1	0
> 55.5	6	7	6/13	7/13

T	Entropy	Probability
< 55.5	0	1/14
> 55.5	0.996	13/14

Info(T, 55.5) = 0 * 1/14 + 0.996 * 13/14 = 0.925

entropy(1/5,4/5) = - 1/5 * log(1/5) - 4/5 * log(4/5) = 0.722

entropy(6/9,3/9) = - 6/9 * log(6/9) - 3/9 * log(3/9) = 0.918

T	a	b	P(a)	P(b)
< 70.5	1	4	1/5	4/5
> 70.5	6	3	6/9	3/9

T	Entropy	Probability
< 70.5	0.722	5/14
> 70.5	0.918	9/14

Info(T, 70.5) = 0.722 * 5/14 + 0.918 * 9/14 = 0.831

entropy(7/11,4/11) = - 7/11 * log(7/11) - 4/11 * log(4/11) = 0.946

entropy(0,1) =0

T	a	b	P(a)	P(b)
< 80.5	7	4	7/11	4/11
> 80.5	0	3	0	1

T	Entropy	Probability
< 80.5	0.946	11/14
> 80.5	0	3/14

Info(T, 80.5) = 0.946 * 11/14 + 0* 3/14 = 0.743

IBS = 1

InfoGain(T, 55.5) = 1 – 0.925 = 0.075

InfoGain(T, 70.5) = 1 – 0.831 = 0.169

InfoGain(T, 80.5) = 1 – 0.743 = 0.257

5 of 14

InfoGain of the T attribute

3 possible InfoGains for 3 possible break points:

InfoGain(T, 55.5) = 0.075
InfoGain(T, 70.5) = 0.169
InfoGain(T, 80.5) = 0.257

Choose the highest one at break point 80.5
InfoGain of the T attribute is 0.257
Attribute T has highest InfoGain 🡺 gets chosen�(InfoGain(D) = 0.02, InfoGain(E) = 0.05)

T

< 80.5

> 80.5

…

6 of 14

Missing values

Outlook	Temp	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Overcast	Hot	High	False	Yes
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Overcast	Cool	Normal	True	Yes
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Rainy	Mild	Normal	False	Yes
Sunny	Mild	Normal	True	Yes
?	Mild	High	True	Yes
Overcast	Hot	Normal	False	Yes
Rainy	Mild	High	True	No

7 of 14

Missing values – calculation

N examples in the learning set
Attributes with no missing values = no change – take into account all N examples
If attribute A has k missing values:

Calculate InfoGain for n-k examples�(those examples for which values are known)
Multiply this InfoGain with a factor (n-k)/n

8 of 14

Gain("outlook")

Outlook	Yes	No	P(yes)	P(no)	Entropy�(bits)	Probability
Sunny	2	3	2/5	3/5
Overcast	3	0	3/3	0/3
Rainy	3	2	3/5	2/5

Gain("outlook") = 13/14 * (information before split – information after split)

Entropy:

Sunny: Info([2,3]) = entropy(2/5,3/5) = -2/5 * log(2/5) – 3/5 * log(3/5) = 0.971

0.971

0

0.971

5/13

3/13

5/13

Overcast: Info([3,0]) = entropy(1,0) = -1 * log(1) – 0 * log(0) = 0

Rainy: Info([3,2]) = entropy(3/5,2/5) = -3/5 * log(3/5) – 2/5 * log(2/5) = 0.971

Info([2,3],[3,0],[3,2]) = 0.971 * (5/13) + 0 * (3/13) + 0.971 * (5/13) = 0.747

Gain("outlook") = 13/14 * (0.961– 0.747) = 0.199

IBS = Info([8,5]) = entropy(8/13,5/13) = -5/13 * log(5/13) – 8/13 * log(8/13) = 0.961

9 of 14

Choosing the "best" attribute

Information Gain:

Outlook: 0.199 bits
Temperature: 0.029 bits
Humidity: 0.152 bits
Wind: 0.048 bits

10 of 14

Outlook	Temp	Humidity	Windy	Play	Weight
Sunny	Hot	High	False	No	1
Sunny	Hot	High	True	No	1
Sunny	Mild	High	False	No	1
Sunny	Cool	Normal	False	Yes	1
Sunny	Mild	Normal	True	Yes	1
?	Mild	High	True	Yes	5/13

Outlook	Temp	Humidity	Windy	Play	Weight
Rainy	Mild	High	False	Yes	1
Rainy	Cool	Normal	False	Yes	1
Rainy	Cool	Normal	True	No	1
Rainy	Mild	Normal	False	Yes	1
Rainy	Mild	High	True	No	1
?	Mild	High	True	Yes	5/13

Outlook

Outlook	Temp	Humidity	Windy	Play	Weight
Overcast	Hot	High	False	Yes	1
Overcast	Cool	Normal	True	Yes	1
Overcast	Hot	Normal	False	Yes	1
?	Mild	High	True	Yes	3/13

Rainy

Overcast

Sunny

11 of 14

Continue in the Outlook=Sunny branch

Outlook	Temp	Humidity	Windy	Play	Weight
Sunny	Hot	High	False	No	1
Sunny	Hot	High	True	No	1
Sunny	Mild	High	False	No	1
Sunny	Cool	Normal	False	Yes	1
Sunny	Mild	Normal	True	Yes	1
?	Mild	High	True	Yes	5/13 ~ 0.4

Hum.	Yes	No	P(yes)	P(no)	Entropy�(bits)	Probability
High	0.4	3	0.4/3.4 = 0.118	3/3.4 = 0.882	0.524	3.4 / 5.4 = 0.63
Normal	2	0	2/2 = 1	0/2 = 0	0	2 / 5.4 = 0.37

Info([0.4,3],[2,0]) = 0.524*0.63 + 0*0.37 = 0.33

InfoGain("Humidity") = info([2,3]) - Info([0,3],[2,0]) = 0.99 – 0.33 = 0.66

High: info([0.4,3]) = entropy(0.118, 0.882) = -0.118*log(0.118)-0.882*log(0.882) = 0.524

Normal: info([2,0]) = entropy(1,0) = 0

IBS = Info(3, 2.4) = entropy(0.56,0.44) = -0.56*log(0.56)-0.44*log(0.44) = 0.99

12 of 14

Yes (100%)

Yes(100%)

No(~80%)

Yes(100%)

No(88%)

13 of 14

Classifying examples with missing values

Outlook	Temp	Humidity	Windy	Play
Sunny	Cool	?	False

P(YES) = 3.4/5.4 * 12% + 2/5.4 * 100% = 44%

?

P(NO) = 3.4/5.4 * 88% + 2/5.4 * 0% = 56%

No

Class distribution in leaves:

4:3 means:

4 examples with class yes� 3 examples with class no

14 of 14

Making rules from a decision tree

Each path from root to a leaf = one rule �🡺 no. of rules = no. of leaves in the tree