Classification
Basic algorithms
Basic classification algorithms
Naïve Bayes
The Bayes rule
H = class
E = attributes
Pr[H|E] = probability of class, given the attributes
…
Pr[E|H] = probability of attributes, given the class
Pr[H] = "a priori" probability of the class (without knowing the attributes)
Pr[E] = probability of the attributes (without knowing the class)
Naïveness …
The Bayes rule again …
… assuming the attributes are pairwise independent� (a "naïve" assumption)
Naïve Bayes – the "weather" data
Outlook | Temp | Humidity | Windy | Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
… build the frequency/probability table
Outlook | Temperature | Humidity | Windy | Play | |||||||||
| Yes | No | | Yes | No | | Yes | No | | Yes | No | Yes | No |
Sunny | 2 | 3 | Hot | 2 | 2 | High | 3 | 4 | False | 6 | 2 | 9 | 5 |
Overcast | 4 | 0 | Mild | 4 | 2 | Normal | 6 | 1 | True | 3 | 3 | | |
Rainy | 3 | 2 | Cool | 3 | 1 | | | | | | | | |
Sunny | 2/9 | 3/5 | Hot | 2/9 | 2/5 | High | 3/9 | 4/5 | False | 6/9 | 2/5 | 9/14 | 5/14 |
Overcast | 4/9 | 0/5 | Mild | 4/9 | 2/5 | Normal | 6/9 | 1/5 | True | 3/9 | 3/5 | | |
Rainy | 3/9 | 2/5 | Cool | 3/9 | 1/5 | | | | | | | | |
Classify a new day:
Sunny | Hot | High | False | |
Likelihoods:
P("Yes") = 2/9 x 2/9 x 3/9 x 6/9 x 9/14 ≈ 0.007
P("No") = 3/5 x 2/5 x 4/5 x 2/5 x 5/14 ≈ 0.027
(Normalized) probabilities:
P("Yes") = 0.007 / (0.007 + 0.027) ≈ 20.5%
P("No") = 0.027 / (0.007 + 0.027) ≈ 79.5% 🡪 Play = "No"
… what about this day?
Likelihoods:
P("Yes") = 4/9 x 2/9 x 3/9 x 6/9 x 9/14 ≈ 0.014
P("No") = 0/5 x 2/5 x 4/5 x 2/5 x 5/14 = 0
(Normalized) probabilities:
P("Yes") = 0.014 / (0.014 + 0.0) = 100% 🡪 Play = "Yes"
P("No") = 0.0 / (0.014 + 0.0) = 0%
Overcast | Hot | High | False | |
… with the Laplace estimate
Outlook | Temperature | Humidity | Windy | Play | |||||||||
| Yes | No | | Yes | No | | Yes | No | | Yes | No | Yes | No |
Sunny | 3 | 4 | Hot | 3 | 3 | High | 4 | 5 | False | 7 | 3 | 10 | 6 |
Overcast | 5 | 1 | Mild | 5 | 3 | Normal | 7 | 2 | True | 4 | 4 | | |
Rainy | 4 | 3 | Cool | 4 | 2 | | | | | | | | |
Sunny | 3/12 | 4/8 | Hot | 3/12 | 3/8 | High | 4/11 | 5/7 | False | 7/11 | 3/7 | 10/16 | 6/16 |
Overcast | 5/12 | 1/8 | Mild | 5/12 | 3/8 | Normal | 7/11 | 2/7 | True | 4/11 | 4/7 | | |
Rainy | 4/12 | 3/8 | Cool | 4/12 | 2/8 | | | | | | | | |
Classify a new day:
Likelihoods:
P("Yes") = 5/12 x 3/12 x 4/11 x 7/11 x 10/16 ≈ 0.015
P("No") = 1/8 x 3/8 x 5/7 x 3/7 x 6/16 ≈ 0.005
(Normalized) probabilities:
P("Yes") = 0.015 / (0.015 + 0.05) ≈ 75% 🡪 Play = "Yes"
P("No") = 0.05 / (0.015 + 0.05) ≈ 25%
Overcast | Hot | High | False | |
A slightly different data set again …
A | F | C |
5 | good | y |
3 | bad | z |
5 | bad | y |
1 | good | y |
5 | bad | y |
3 | bad | w |
5 | bad | w |
3 | bad | x |
2 | good | y |
4 | bad | z |
2 | good | z |
… build the frequency/probability tables
A \ C | w | x | y | z |
1 | 1 | 1 | 2 | 1 |
2 | 1 | 1 | 2 | 2 |
3 | 2 | 2 | 1 | 2 |
4 | 1 | 1 | 1 | 2 |
5 | 2 | 1 | 4 | 1 |
F \ C | w | x | y | z |
good | 1 | 1 | 4 | 2 |
bad | 3 | 2 | 3 | 3 |
A \ C | w | x | y | z |
1 | 1/7 | 1/6 | 2/10 | 1/8 |
2 | 1/7 | 1/6 | 2/10 | 2/8 |
3 | 2/7 | 2/6 | 1/10 | 2/8 |
4 | 1/7 | 1/6 | 1/10 | 2/8 |
5 | 2/7 | 1/6 | 4/10 | 1/8 |
F \ C | w | x | y | z |
good | 1/4 | 1/3 | 4/7 | 2/5 |
bad | 3/4 | 2/3 | 3/7 | 3/5 |
C | w | x | y | z |
| 3 | 2 | 6 | 4 |
C | w | x | y | z |
| 3/15 | 2/15 | 6/15 | 4/15 |
… classify the following example
A | F | C |
2 | bad | ? |
Compute the likelihoods:
P("w") = 1/7 x 3/4 x 3/15 ≈ 0.021
P("x") = 1/6 x 2/3 x 2/15 ≈ 0.015
P("y")= 2/10 x 3/7 x 6/15 ≈ 0.034
P("z") = 2/8 x 3/5 x 4/15 ≈ 0.04
w | x | y | z |
0.021 | 0.015 | 0.034 | 0.04 |
19% | 13.6% | 30.9% | 36.4% |
Derive the (normalized) probabilities:
Choose the highest probability and classify the example in class z.
Missing values
Classify the new day:
Likelihoods:
P("Yes") = 5/12 x 3/12 x 4/11 x 7/11 x 10/16 ≈ 0.015 ≈ 0.036
P("No") = 1/8 x 3/8 x 5/7 x 3/7 x 6/16 ≈ 0.005 ≈ 0.043
(Normalized) probabilities:
P("Yes") = 0.036 / (0.036 + 0.043) ≈ 46%
P("No") = 0.043 / (0.036 + 0.043) ≈ 54% 🡪 Play = "No"
? | Hot | High | False | |
What have you learned?