Classification
Decision trees
What is a decision tree?
How to "build" a decision tree?
Choosing the "best" attribute
5
Which attribute to choose? – example
Information gain
Information gain – definition
Computing information – "before"
Computing information – "after"
InfoGain – the "weather" example
Outlook | Temperature | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
Computing the IBS
InfoGain("outlook")
Outlook | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
Sunny | 2 | 3 | 2/5 | 3/5 | | |
Overcast | 4 | 0 | 4/4 | 0/4 | | |
Rainy | 3 | 2 | 3/5 | 2/5 | | |
InfoGain("outlook") = IBS – Info("outlook")
Entropy:
Sunny: Info([2,3]) = entropy(2/5,3/5) = – 2/5 * log2(2/5) – 3/5 * log2(3/5) = 0.971
0.971
0
0.971
5/14
4/14
5/14
Overcast: Info([4,0]) = entropy(1,0) = – 1 * log2(1) – 0 * log2(0) = 0
Rainy: Info([3,2]) = entropy(3/5,2/5) = – 3/5 * log2(3/5) – 2/5 * log2(2/5) = 0.971
Info("outlook") =
Info([2,3],[4,0],[3,2]) = (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971 = 0.693
InfoGain("outlook") = 0.940 – 0.693 = 0.247
InfoGain("temperature")
Temperature | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
Hot | 2 | 2 | 2/4 | 2/4 | | |
Mild | 4 | 2 | 4/6 | 2/6 | | |
Cool | 3 | 1 | 3/4 | 1/4 | | |
Entropy:
Hot: Info([2,2]) = entropy(2/4,2/4) = – 2/4 * log2(2/4) – 2/4 * log2(2/4) = 1
1
0.918
0.811
4/14
6/14
4/14
Mild: Info([4,2]) = entropy(4/6,2/6) = – 4/6 * log2(4/6) – 2/6 * log2(2/6) = 0.918
Cool: Info([3,1]) = entropy(3/4,1/4) = – 3/4 * log2(3/4) – 1/4 * log2(1/4) = 0.811
Info("temperature") =
Info([2,2],[4,2],[3,1]) = (4/14) * 1 + (6/14) * 0.918 + (4/14) * 0.811 = 0.911
InfoGain("temperature") = 0.940 – 0.911 = 0.029
InfoGain("humidity")
Humidity | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
High | 3 | 4 | 3/7 | 4/7 | | |
Normal | 6 | 1 | 6/7 | 1/7 | | |
Entropy:
High: Info([3,4]) = entropy(3/7,4/7) = – 3/7 * log2(3/7) – 4/7 * log2(4/7) = 0.985
0.985
0.592
7/14
7/14
Normal: Info([6,1]) = entropy(6/7,1/7) = – 6/7 * log2(6/7) – 1/7 * log2(1/7) = 0.592
Info("humidity") =
Info([3,4],[6,1]) = (7/14) * 0.985 + (7/14) * 0.592 = 0.789
InfoGain("humidity") = 0.940 – 0.789 = 0.151
InfoGain("windy")
Windy | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
True | 6 | 2 | 6/8 | 2/8 | | |
False | 3 | 3 | 3/6 | 3/6 | | |
Entropy:
True: Info([6,2]) = entropy(6/8,2/8) = – 6/8 * log2(6/8) – 2/8 * log2(2/8) = 0.811
0.811
1
8/14
6/14
False: Info([3,3]) = entropy(3/6,3/6) = – 3/6 * log2(3/6) – 3/6 * log2(3/6) = 1
Info("windy") =
Info([6,2],[3,3]) = (8/14) * 0.811 + (6/14) * 1 = 0.892
InfoGain("windy") = 0.940 – 0.892 = 0.048
Which attribute is "best"?
Outlook = Sunny
Outlook | Temperature | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
Step 2 = recursion
Next level in the tree …
Outlook | Temperature | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Outlook = Sunny
Attribute Temperature
Temperature | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
Hot | 0 | 2 | 0/2 | 2/2 | 0 | 2/5 |
Mild | 1 | 1 | 1/2 | 1/2 | 1 | 2/5 |
Cool | 1 | 0 | 1/1 | 0/1 | 0 | 1/5 |
Info("Temperature") = info([0,2],[1,1,],[1,0]) = 2/5*0 + 2/5*1 + 1/5*0 = 0.4
InfoGain("Temperature") = info([2,3]) – info([0,2],[1,1,],[1,0]) = 0.971 – 0.4 = 0.571
Hot: info([0,2]) = entropy(0,1) = 0
Mild: info([1,1]) = entropy(1/2,1/2) = 1
Cool: info([1,0]) = entropy(1,0) = 0
Next level in the tree …
Outlook | Temperature | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Outlook = Sunny
Attribute Humidity
Humidity | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
High | 0 | 3 | 0/3 | 3/3 | 0 | 3/5 |
Normal | 2 | 0 | 2/2 | 0/2 | 0 | 2/5 |
Info("Humidity") = info([0,3],[2,0]) = 2/5*0 + 3/5*0 = 0
InfoGain("Humidity") = info([2,3]) – info([0,3],[2,0]) = 0.971 – 0 = 0.971
High: info([0,3]) = entropy(0,1) = 0
Normal: info([2,0]) = entropy(1,0) = 0
Next level in the tree …
Outlook | Temp | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Outlook = Sunny
Attribute Windy
Windy | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
True | 1 | 2 | 1/3 | 2/3 | 0.918 | 3/5 |
False | 1 | 1 | 1/2 | 1/2 | 1 | 2/5 |
Info("Windy") = info([1,2],[1,1]) = 3/5* 0.918 + 2/5*1 = 0.951
InfoGain("Windy") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02
True: info([1,2]) = entropy(1/3,2/3) = – 1/3*log2(1/3) – 2/3*log2(2/3) = 0.918
False: info([1,1]) = entropy(1/2,1/2) = – 1/2*log2(1/2) – 1/2*log2(1/2) = 1
Choosing the "best" attribute
Outlook = Overcast
Outlook | Temp | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
Step 2 = recursion
Outlook = Overcast
Outlook | Temp | Humidity | Windy | Play |
Overcast | Hot | High | False | Yes |
Overcast | Cool | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Next level in the tree …
Outlook = Rainy
Outlook | Temp | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
Step 2 = recursion
Outlook = Rainy
Outlook | Temperature | Humidity | Windy | Play |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Rainy | Mild | Normal | False | Yes |
Rainy | Mild | High | True | No |
Temperature | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
Mild | 2 | 1 | 2/3 | 1/3 | 0.918 | 3/5 |
Cool | 1 | 1 | 1/2 | 1/2 | 1 | 2/5 |
Info("Temperature") = info([1,2],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951
InfoGain("Temperature") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02
Mild: info([1,2]) = entropy(1/3,2/3) = – 1/3*log2(1/3) – 2/3*log2(2/3) = 0.918
Cool: info([1,1]) = entropy(1/2,1/2) = – 1/2*log2(1/2) – 1/2*log2(1/2) = 1
Next level in the tree …
Outlook = Rainy
Outlook | Temp | Humidity | Windy | Play |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Rainy | Mild | Normal | False | Yes |
Rainy | Mild | High | True | No |
Humidity | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
High | 1 | 1 | 1/2 | 1/2 | 1 | 2/5 |
Normal | 2 | 1 | 2/3 | 1/3 | 0.918 | 3/5 |
Info("Humidity") = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951
InfoGain("Humidity") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0.951 = 0.02
High: info([1,1]) = entropy(1/2,1/2) = – 1/2*log2(1/2) – 1/2*log2(1/2) = 1
Normal: info([1,2]) = entropy(1/3,2/3) = – 1/3*log2(1/3) – 2/3*log2(2/3) = 0.918
Next level in the tree …
Outlook = Rainy
Outlook | Temp | Humidity | Windy | Play |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Rainy | Mild | Normal | False | Yes |
Rainy | Mild | High | True | No |
Windy | Yes | No | P(Yes) | P(No) | Entropy�(bits) | Probability |
True | 0 | 2 | 0/2 | 2/2 | 0 | 2/5 |
False | 3 | 0 | 3/3 | 0/3 | 0 | 3/5 |
Info ("Windy") = info([2,1],[1,1]) = 3/5*0 + 2/5*0 = 0
InfoGain("Windy") = info([2,3]) – info([1,2],[1,1,]) = 0.971 – 0 = 0.971
High: info([0,2]) = 0
Normal: info([3,0]) = 0
Next level in the tree …
Choosing the "best" attribute
The "final" decision tree