1 of 29

Classification

Decision trees

2 of 29

What is a decision tree?

  • "First" node is the root of the tree
  • "Last" nodes are called leaves
  • The edges between nodes represent�branches

  • How do we "read" a decision tree?

3 of 29

How to "build" a decision tree?

  • Procedure (recursive):
    • Choose the "best" attribute
    • Split the data into subsets according to the values of the "best" attribute
    • Repeat the procedure in each of the subsets
  • Stop when (the stopping criterion):
    • out of attributes,
    • all leaves are "pure"�(all examples in a leaf belong to the same class),
    • the (sub)set is empty.

4 of 29

Choosing the "best" attribute

  • Which attribute is "the best"?
    • The one, yielding the smaller tree
    • The one, yielding the highest classification accuracy
    • Heuristic: choose attribute, yielding "purest" nodes
  • Let's look at some "(im)purity" measures:
    • Information gain
    • Information gain ratio
    • Gini index

5 of 29

5

Which attribute to choose? – example

6 of 29

Information gain

  • Measuring the information as entropy:
    • given the probability distribution of some events, entropy measures the information that is needed to encode every possible outcome of these events
    • entropy is measured in bits
  • Formula for computing the entropy:

  • where p1, p2, …, pn are the probabilities of given events

 

7 of 29

Information gain – definition

  • Information gain = � = information "before split" – information "after split"

  • InfoGain = IBS – IAS

 

8 of 29

Computing information – "before"

  • If a set T contains examples with n classes,�info(T) is defined as:

  • where pj is the relative frequency (probability)�of class j in T.

9 of 29

Computing information – "after"

  • We split the set T containing N examples into subsets�T1, T2, …, Tk containing N1, N2, …, Nk examples, respectively. The information of this split is defined as:

 

10 of 29

InfoGain – the "weather" example

Outlook

Temperature

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

11 of 29

Computing the IBS

  • Information "before the split":
    • Yes: 9 examples
    • No: 5 examples
    • Info([5,9]) = entropy(5/14, 9/14) =�= – 5/14 * log2(5/14) – 9/14 * log2(9/14) = 0.940

12 of 29

InfoGain("outlook")

Outlook

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

Sunny

2

3

2/5

3/5

Overcast

4

0

4/4

0/4

Rainy

3

2

3/5

2/5

InfoGain("outlook") = IBS – Info("outlook")

Entropy:

Sunny: Info([2,3]) = entropy(2/5,3/5) = – 2/5 * log2(2/5) – 3/5 * log2(3/5) = 0.971

0.971

0

0.971

5/14

4/14

5/14

Overcast: Info([4,0]) = entropy(1,0) = – 1 * log2(1) – 0 * log2(0) = 0

Rainy: Info([3,2]) = entropy(3/5,2/5) = – 3/5 * log2(3/5) – 2/5 * log2(2/5) = 0.971

Info("outlook") =

Info([2,3],[4,0],[3,2]) = (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971 = 0.693

InfoGain("outlook") = 0.940 – 0.693 = 0.247

13 of 29

InfoGain("temperature")

Temperature

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

Hot

2

2

2/4

2/4

Mild

4

2

4/6

2/6

Cool

3

1

3/4

1/4

Entropy:

Hot: Info([2,2]) = entropy(2/4,2/4) = – 2/4 * log2(2/4) – 2/4 * log2(2/4) = 1

1

0.918

0.811

4/14

6/14

4/14

Mild: Info([4,2]) = entropy(4/6,2/6) = – 4/6 * log2(4/6) – 2/6 * log2(2/6) = 0.918

Cool: Info([3,1]) = entropy(3/4,1/4) = – 3/4 * log2(3/4) – 1/4 * log2(1/4) = 0.811

Info("temperature") =

Info([2,2],[4,2],[3,1]) = (4/14) * 1 + (6/14) * 0.918 + (4/14) * 0.811 = 0.911

InfoGain("temperature") = 0.940 – 0.911 = 0.029

14 of 29

InfoGain("humidity")

Humidity

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

High

3

4

3/7

4/7

Normal

6

1

6/7

1/7

Entropy:

High: Info([3,4]) = entropy(3/7,4/7) = – 3/7 * log2(3/7) – 4/7 * log2(4/7) = 0.985

0.985

0.592

7/14

7/14

Normal: Info([6,1]) = entropy(6/7,1/7) = – 6/7 * log2(6/7) – 1/7 * log2(1/7) = 0.592

Info("humidity") =

Info([3,4],[6,1]) = (7/14) * 0.985 + (7/14) * 0.592 = 0.789

InfoGain("humidity") = 0.940 – 0.789 = 0.151

15 of 29

InfoGain("windy")

Windy

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

True

6

2

6/8

2/8

False

3

3

3/6

3/6

Entropy:

True: Info([6,2]) = entropy(6/8,2/8) = – 6/8 * log2(6/8) – 2/8 * log2(2/8) = 0.811

0.811

1

8/14

6/14

False: Info([3,3]) = entropy(3/6,3/6) = – 3/6 * log2(3/6) – 3/6 * log2(3/6) = 1

Info("windy") =

Info([6,2],[3,3]) = (8/14) * 0.811 + (6/14) * 1 = 0.892

InfoGain("windy") = 0.940 – 0.892 = 0.048

16 of 29

Which attribute is "best"?

  • The one with the highest information gain�(InfoGain):
    • Outlook: 0.247 bits
    • Temperature: 0.029 bits
    • Humidity: 0.152 bits
    • Wind: 0.048 bits

17 of 29

Outlook = Sunny

Outlook

Temperature

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Step 2 = recursion

18 of 29

Next level in the tree …

Outlook

Temperature

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Outlook = Sunny

Attribute Temperature

Temperature

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

Hot

0

2

0/2

2/2

0

2/5

Mild

1

1

1/2

1/2

1

2/5

Cool

1

0

1/1

0/1

0

1/5

Info("Temperature") = info([0,2],[1,1,],[1,0]) = 2/5*0 + 2/5*1 + 1/5*0 = 0.4

InfoGain("Temperature") = info([2,3]) – info([0,2],[1,1,],[1,0]) = 0.971 – 0.4 = 0.571

Hot: info([0,2]) = entropy(0,1) = 0

Mild: info([1,1]) = entropy(1/2,1/2) = 1

Cool: info([1,0]) = entropy(1,0) = 0

19 of 29

Next level in the tree …

Outlook

Temperature

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Outlook = Sunny

Attribute Humidity

Humidity

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

High

0

3

0/3

3/3

0

3/5

Normal

2

0

2/2

0/2

0

2/5

Info("Humidity") = info([0,3],[2,0]) = 2/5*0 + 3/5*0 = 0

InfoGain("Humidity") = info([2,3]) – info([0,3],[2,0]) = 0.971 – 0 = 0.971

High: info([0,3]) = entropy(0,1) = 0

Normal: info([2,0]) = entropy(1,0) = 0

20 of 29

Next level in the tree …

Outlook

Temp

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Outlook = Sunny

Attribute Windy

Windy

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

True

1

2

1/3

2/3

0.918

3/5

False

1

1

1/2

1/2

1

2/5

Info("Windy") = info([1,2],[1,1]) = 3/5* 0.918 + 2/5*1 = 0.951

InfoGain("Windy") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02

True: info([1,2]) = entropy(1/3,2/3) = – 1/3*log2(1/3) – 2/3*log2(2/3) = 0.918

False: info([1,1]) = entropy(1/2,1/2) = – 1/2*log2(1/2) – 1/2*log2(1/2) = 1

21 of 29

Choosing the "best" attribute

  • Information gain (InfoGain):
    • Temperature: 0.571
    • Humidity: 0.971
    • Windy: 0.02
  • Choose the attribute Humidity
    • The leaves are pure 🡪�no need to continue the procedure

22 of 29

Outlook = Overcast

Outlook

Temp

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Step 2 = recursion

23 of 29

Outlook = Overcast

  • No need to further split 🡨 the node is "pure"�(contains just examples of class "Yes")

Outlook

Temp

Humidity

Windy

Play

Overcast

Hot

High

False

Yes

Overcast

Cool

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Next level in the tree …

24 of 29

Outlook = Rainy

Outlook

Temp

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Step 2 = recursion

25 of 29

Outlook = Rainy

Outlook

Temperature

Humidity

Windy

Play

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Rainy

Mild

Normal

False

Yes

Rainy

Mild

High

True

No

Temperature

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

Mild

2

1

2/3

1/3

0.918

3/5

Cool

1

1

1/2

1/2

1

2/5

Info("Temperature") = info([1,2],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951

InfoGain("Temperature") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02

Mild: info([1,2]) = entropy(1/3,2/3) = – 1/3*log2(1/3) – 2/3*log2(2/3) = 0.918

Cool: info([1,1]) = entropy(1/2,1/2) = – 1/2*log2(1/2) – 1/2*log2(1/2) = 1

Next level in the tree …

26 of 29

Outlook = Rainy

Outlook

Temp

Humidity

Windy

Play

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Rainy

Mild

Normal

False

Yes

Rainy

Mild

High

True

No

Humidity

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

High

1

1

1/2

1/2

1

2/5

Normal

2

1

2/3

1/3

0.918

3/5

Info("Humidity") = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951

InfoGain("Humidity") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02

High: info([1,1]) = entropy(1/2,1/2) = – 1/2*log2(1/2) – 1/2*log2(1/2) = 1

Normal: info([1,2]) = entropy(1/3,2/3) = – 1/3*log2(1/3) – 2/3*log2(2/3) = 0.918

Next level in the tree …

27 of 29

Outlook = Rainy

Outlook

Temp

Humidity

Windy

Play

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Rainy

Mild

Normal

False

Yes

Rainy

Mild

High

True

No

Windy

Yes

No

P(Yes)

P(No)

Entropy�(bits)

Probability

True

0

2

0/2

2/2

0

2/5

False

3

0

3/3

0/3

0

3/5

Info ("Windy") = info([2,1],[1,1]) = 3/5*0 + 2/5*0 = 0

InfoGain("Windy") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0 = 0.971

High: info([0,2]) = 0

Normal: info([3,0]) = 0

Next level in the tree …

28 of 29

Choosing the "best" attribute

  • Informatiom gain (InfoGain):
    • Temperature: 0.02
    • Humidity: 0.02
    • Windy: 0.971
  • Choose the attribute Windy
    • The leaves are pure 🡪�no need to continue the procedure

29 of 29

The "final" decision tree