1 of 29

Classification

Decision trees

2 of 29

What is a decision tree?

"First" node is the root of the tree
"Last" nodes are called leaves
The edges between nodes represent�branches

How do we "read" a decision tree?

3 of 29

How to "build" a decision tree?

Procedure (recursive):

Choose the "best" attribute
Split the data into subsets according to the values of the "best" attribute
Repeat the procedure in each of the subsets

Stop when (the stopping criterion):

out of attributes,
all leaves are "pure"�(all examples in a leaf belong to the same class),
the (sub)set is empty.

4 of 29

Choosing the "best" attribute

Which attribute is "the best"?

The one, yielding the smaller tree
The one, yielding the highest classification accuracy
Heuristic: choose attribute, yielding "purest" nodes

Let's look at some "(im)purity" measures:

Information gain
Information gain ratio
Gini index

5 of 29

5

Which attribute to choose? – example

6 of 29

Information gain

Measuring the information as entropy:

given the probability distribution of some events, entropy measures the information that is needed to encode every possible outcome of these events
entropy is measured in bits

Formula for computing the entropy:

where p₁, p₂, …, p_n are the probabilities of given events

7 of 29

Information gain – definition

Information gain = � = information "before split" – information "after split"

InfoGain = IBS – IAS

8 of 29

Computing information – "before"

If a set T contains examples with n classes,�info(T) is defined as:

where p_j is the relative frequency (probability)�of class j in T.

9 of 29

Computing information – "after"

We split the set T containing N examples into subsets�T₁, T₂, …, T_k containing N₁, N₂, …, N_k examples, respectively. The information of this split is defined as:

10 of 29

InfoGain – the "weather" example

Outlook	Temperature	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Overcast	Hot	High	False	Yes
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Overcast	Cool	Normal	True	Yes
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Rainy	Mild	Normal	False	Yes
Sunny	Mild	Normal	True	Yes
Overcast	Mild	High	True	Yes
Overcast	Hot	Normal	False	Yes
Rainy	Mild	High	True	No

11 of 29

Computing the IBS

Information "before the split":

Yes: 9 examples
No: 5 examples
Info([5,9]) = entropy(5/14, 9/14) =�= – 5/14 * log₂(5/14) – 9/14 * log₂(9/14) = 0.940

12 of 29

InfoGain("outlook")

Outlook	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
Sunny	2	3	2/5	3/5
Overcast	4	0	4/4	0/4
Rainy	3	2	3/5	2/5

InfoGain("outlook") = IBS – Info("outlook")

Entropy:

Sunny: Info([2,3]) = entropy(2/5,3/5) = – 2/5 * log₂(2/5) – 3/5 * log₂(3/5) = 0.971

0.971

0

0.971

5/14

4/14

5/14

Overcast: Info([4,0]) = entropy(1,0) = – 1 * log₂(1) – 0 * log₂(0) = 0

Rainy: Info([3,2]) = entropy(3/5,2/5) = – 3/5 * log₂(3/5) – 2/5 * log₂(2/5) = 0.971

Info("outlook") =

Info([2,3],[4,0],[3,2]) = (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971 = 0.693

InfoGain("outlook") = 0.940 – 0.693 = 0.247

13 of 29

InfoGain("temperature")

Temperature	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
Hot	2	2	2/4	2/4
Mild	4	2	4/6	2/6
Cool	3	1	3/4	1/4

Entropy:

Hot: Info([2,2]) = entropy(2/4,2/4) = – 2/4 * log₂(2/4) – 2/4 * log₂(2/4) = 1

1

0.918

0.811

4/14

6/14

4/14

Mild: Info([4,2]) = entropy(4/6,2/6) = – 4/6 * log₂(4/6) – 2/6 * log₂(2/6) = 0.918

Cool: Info([3,1]) = entropy(3/4,1/4) = – 3/4 * log₂(3/4) – 1/4 * log₂(1/4) = 0.811

Info("temperature") =

Info([2,2],[4,2],[3,1]) = (4/14) * 1 + (6/14) * 0.918 + (4/14) * 0.811 = 0.911

InfoGain("temperature") = 0.940 – 0.911 = 0.029

14 of 29

InfoGain("humidity")

Humidity	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
High	3	4	3/7	4/7
Normal	6	1	6/7	1/7

Entropy:

High: Info([3,4]) = entropy(3/7,4/7) = – 3/7 * log₂(3/7) – 4/7 * log₂(4/7) = 0.985

0.985

0.592

7/14

Normal: Info([6,1]) = entropy(6/7,1/7) = – 6/7 * log₂(6/7) – 1/7 * log₂(1/7) = 0.592

Info("humidity") =

Info([3,4],[6,1]) = (7/14) * 0.985 + (7/14) * 0.592 = 0.789

InfoGain("humidity") = 0.940 – 0.789 = 0.151

15 of 29

InfoGain("windy")

Windy	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
True	6	2	6/8	2/8
False	3	3	3/6	3/6

Entropy:

True: Info([6,2]) = entropy(6/8,2/8) = – 6/8 * log₂(6/8) – 2/8 * log₂(2/8) = 0.811

0.811

1

8/14

6/14

False: Info([3,3]) = entropy(3/6,3/6) = – 3/6 * log₂(3/6) – 3/6 * log₂(3/6) = 1

Info("windy") =

Info([6,2],[3,3]) = (8/14) * 0.811 + (6/14) * 1 = 0.892

InfoGain("windy") = 0.940 – 0.892 = 0.048

16 of 29

Which attribute is "best"?

The one with the highest information gain�(InfoGain):

Outlook: 0.247 bits
Temperature: 0.029 bits
Humidity: 0.152 bits
Wind: 0.048 bits

17 of 29

Outlook = Sunny

Outlook	Temperature	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Overcast	Hot	High	False	Yes
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Overcast	Cool	Normal	True	Yes
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Rainy	Mild	Normal	False	Yes
Sunny	Mild	Normal	True	Yes
Overcast	Mild	High	True	Yes
Overcast	Hot	Normal	False	Yes
Rainy	Mild	High	True	No

Step 2 = recursion

18 of 29

Next level in the tree …

Outlook	Temperature	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Sunny	Mild	Normal	True	Yes

Outlook = Sunny

Attribute Temperature

Temperature	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
Hot	0	2	0/2	2/2	0	2/5
Mild	1	1	1/2	1/2	1	2/5
Cool	1	0	1/1	0/1	0	1/5

Info("Temperature") = info([0,2],[1,1,],[1,0]) = 2/5*0 + 2/5*1 + 1/5*0 = 0.4

InfoGain("Temperature") = info([2,3]) – info([0,2],[1,1,],[1,0]) = 0.971 – 0.4 = 0.571

Hot: info([0,2]) = entropy(0,1) = 0

Mild: info([1,1]) = entropy(1/2,1/2) = 1

Cool: info([1,0]) = entropy(1,0) = 0

19 of 29

Next level in the tree …

Outlook	Temperature	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Sunny	Mild	Normal	True	Yes

Outlook = Sunny

Attribute Humidity

Humidity	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
High	0	3	0/3	3/3	0	3/5
Normal	2	0	2/2	0/2	0	2/5

Info("Humidity") = info([0,3],[2,0]) = 2/5*0 + 3/5*0 = 0

InfoGain("Humidity") = info([2,3]) – info([0,3],[2,0]) = 0.971 – 0 = 0.971

High: info([0,3]) = entropy(0,1) = 0

Normal: info([2,0]) = entropy(1,0) = 0

20 of 29

Next level in the tree …

Outlook	Temp	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Sunny	Mild	Normal	True	Yes

Outlook = Sunny

Attribute Windy

Windy	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
True	1	2	1/3	2/3	0.918	3/5
False	1	1	1/2	1/2	1	2/5

Info("Windy") = info([1,2],[1,1]) = 3/5* 0.918 + 2/5*1 = 0.951

InfoGain("Windy") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02

True: info([1,2]) = entropy(1/3,2/3) = – 1/3*log₂(1/3) – 2/3*log₂(2/3) = 0.918

False: info([1,1]) = entropy(1/2,1/2) = – 1/2*log₂(1/2) – 1/2*log₂(1/2) = 1

21 of 29

Choosing the "best" attribute

Information gain (InfoGain):

Temperature: 0.571
Humidity: 0.971
Windy: 0.02

Choose the attribute Humidity

The leaves are pure 🡪�no need to continue the procedure

22 of 29

Outlook = Overcast

Outlook	Temp	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Overcast	Hot	High	False	Yes
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Overcast	Cool	Normal	True	Yes
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Rainy	Mild	Normal	False	Yes
Sunny	Mild	Normal	True	Yes
Overcast	Mild	High	True	Yes
Overcast	Hot	Normal	False	Yes
Rainy	Mild	High	True	No

Step 2 = recursion

23 of 29

Outlook = Overcast

No need to further split 🡨 the node is "pure"�(contains just examples of class "Yes")

Outlook	Temp	Humidity	Windy	Play
Overcast	Hot	High	False	Yes
Overcast	Cool	Normal	True	Yes
Overcast	Mild	High	True	Yes
Overcast	Hot	Normal	False	Yes

Next level in the tree …

24 of 29

Outlook = Rainy

Outlook	Temp	Humidity	Windy	Play
Sunny	Hot	High	False	No
Sunny	Hot	High	True	No
Overcast	Hot	High	False	Yes
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Overcast	Cool	Normal	True	Yes
Sunny	Mild	High	False	No
Sunny	Cool	Normal	False	Yes
Rainy	Mild	Normal	False	Yes
Sunny	Mild	Normal	True	Yes
Overcast	Mild	High	True	Yes
Overcast	Hot	Normal	False	Yes
Rainy	Mild	High	True	No

Step 2 = recursion

25 of 29

Outlook = Rainy

Outlook	Temperature	Humidity	Windy	Play
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Rainy	Mild	Normal	False	Yes
Rainy	Mild	High	True	No

Temperature	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
Mild	2	1	2/3	1/3	0.918	3/5
Cool	1	1	1/2	1/2	1	2/5

Info("Temperature") = info([1,2],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951

InfoGain("Temperature") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02

Mild: info([1,2]) = entropy(1/3,2/3) = – 1/3*log₂(1/3) – 2/3*log₂(2/3) = 0.918

Cool: info([1,1]) = entropy(1/2,1/2) = – 1/2*log₂(1/2) – 1/2*log₂(1/2) = 1

Next level in the tree …

26 of 29

Outlook = Rainy

Outlook	Temp	Humidity	Windy	Play
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Rainy	Mild	Normal	False	Yes
Rainy	Mild	High	True	No

Humidity	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
High	1	1	1/2	1/2	1	2/5
Normal	2	1	2/3	1/3	0.918	3/5

Info("Humidity") = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951

InfoGain("Humidity") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02

High: info([1,1]) = entropy(1/2,1/2) = – 1/2*log₂(1/2) – 1/2*log₂(1/2) = 1

Normal: info([1,2]) = entropy(1/3,2/3) = – 1/3*log₂(1/3) – 2/3*log₂(2/3) = 0.918

Next level in the tree …

27 of 29

Outlook = Rainy

Outlook	Temp	Humidity	Windy	Play
Rainy	Mild	High	False	Yes
Rainy	Cool	Normal	False	Yes
Rainy	Cool	Normal	True	No
Rainy	Mild	Normal	False	Yes
Rainy	Mild	High	True	No

Windy	Yes	No	P(Yes)	P(No)	Entropy�(bits)	Probability
True	0	2	0/2	2/2	0	2/5
False	3	0	3/3	0/3	0	3/5

Info ("Windy") = info([2,1],[1,1]) = 3/5*0 + 2/5*0 = 0

InfoGain("Windy") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0 = 0.971

High: info([0,2]) = 0

Normal: info([3,0]) = 0

Next level in the tree …

28 of 29

Choosing the "best" attribute

Informatiom gain (InfoGain):

Temperature: 0.02
Humidity: 0.02
Windy: 0.971

Choose the attribute Windy

The leaves are pure 🡪�no need to continue the procedure

29 of 29

The "final" decision tree