1 of 15

Classification

Basic algorithms

2 of 15

Basic classification algorithms

  • Task:
    • Build a model by using known data�(a classifier for classifying new "unseen" examples)
    • The data that we used for building our model is called the TRAINING SET
  • Supervised learning:
    • the class for the training set examples is known
  • You will learn about the following classifiers:
    • Naïve Bayes

3 of 15

Naïve Bayes

  • Uses all the attributes
    • That is not always a good choice …
      • Example: 1,000,000 attributes

  • Naïve, because of its over-simplified �"looking at things". ��It assumes that:
    • All attributes are "equally important"
    • All attributes are pairwise independent

4 of 15

The Bayes rule

H = class

E = attributes

Pr[H|E] = probability of class, given the attributes

Pr[E|H] = probability of attributes, given the class

Pr[H] = "a priori" probability of the class (without knowing the attributes)

Pr[E] = probability of the attributes (without knowing the class)

5 of 15

Naïveness …

  • Pr[E|H] can be written as …

  • It follows that …

  • This, we can compute …
    • Pr[sunny|yes] … probability of sunny, while we are playing
      • 9 times we played, 2 times it was sunny 🡪 2/9
    • Pr[cool|yes] … probability of cool, while we are playing
      • 9 times we played, 3 times it was cool 🡪 3/9

6 of 15

The Bayes rule again …

assuming the attributes are pairwise independent� (a "naïve" assumption)

7 of 15

Naïve Bayes – the "weather" data

Outlook

Temp

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

8 of 15

… build the frequency/probability table

Outlook

Temperature

Humidity

Windy

Play

Yes

No

Yes

No

Yes

No

Yes

No

Yes

No

Sunny

2

3

Hot

2

2

High

3

4

False

6

2

9

5

Overcast

4

0

Mild

4

2

Normal

6

1

True

3

3

Rainy

3

2

Cool

3

1

Sunny

2/9

3/5

Hot

2/9

2/5

High

3/9

4/5

False

6/9

2/5

9/14

5/14

Overcast

4/9

0/5

Mild

4/9

2/5

Normal

6/9

1/5

True

3/9

3/5

Rainy

3/9

2/5

Cool

3/9

1/5

Classify a new day:

Sunny

Hot

High

False

Likelihoods:

P("Yes") = 2/9 x 2/9 x 3/9 x 6/9 x 9/14 ≈ 0.007

P("No") = 3/5 x 2/5 x 4/5 x 2/5 x 5/14 ≈ 0.027

(Normalized) probabilities:

P("Yes") = 0.007 / (0.007 + 0.027) ≈ 20.5%

P("No") = 0.027 / (0.007 + 0.027) ≈ 79.5% 🡪 Play = "No"

9 of 15

… what about this day?

  • Does this make sense?
    • one attribute "overrules" all the others …
    • we can handly this with the Laplace estimate
  • Laplace estimate:
    • Add 1 to each frequency count
    • Again, compute the probabilities

Likelihoods:

P("Yes") = 4/9 x 2/9 x 3/9 x 6/9 x 9/14 ≈ 0.014

P("No") = 0/5 x 2/5 x 4/5 x 2/5 x 5/14 = 0

(Normalized) probabilities:

P("Yes") = 0.014 / (0.014 + 0.0) = 100% 🡪 Play = "Yes"

P("No") = 0.0 / (0.014 + 0.0) = 0%

Overcast

Hot

High

False

10 of 15

… with the Laplace estimate

Outlook

Temperature

Humidity

Windy

Play

Yes

No

Yes

No

Yes

No

Yes

No

Yes

No

Sunny

3

4

Hot

3

3

High

4

5

False

7

3

10

6

Overcast

5

1

Mild

5

3

Normal

7

2

True

4

4

Rainy

4

3

Cool

4

2

Sunny

3/12

4/8

Hot

3/12

3/8

High

4/11

5/7

False

7/11

3/7

10/16

6/16

Overcast

5/12

1/8

Mild

5/12

3/8

Normal

7/11

2/7

True

4/11

4/7

Rainy

4/12

3/8

Cool

4/12

2/8

Classify a new day:

Likelihoods:

P("Yes") = 5/12 x 3/12 x 4/11 x 7/11 x 10/16 ≈ 0.015

P("No") = 1/8 x 3/8 x 5/7 x 3/7 x 6/16 ≈ 0.005

(Normalized) probabilities:

P("Yes") = 0.015 / (0.015 + 0.05) ≈ 75% 🡪 Play = "Yes"

P("No") = 0.05 / (0.015 + 0.05) ≈ 25%

Overcast

Hot

High

False

11 of 15

A slightly different data set again …

A

F

C

5

good

y

3

bad

z

5

bad

y

1

good

y

5

bad

y

3

bad

w

5

bad

w

3

bad

x

2

good

y

4

bad

z

2

good

z

12 of 15

… build the frequency/probability tables

A \ C

w

x

y

z

1

1

1

2

1

2

1

1

2

2

3

2

2

1

2

4

1

1

1

2

5

2

1

4

1

F \ C

w

x

y

z

good

1

1

4

2

bad

3

2

3

3

A \ C

w

x

y

z

1

1/7

1/6

2/10

1/8

2

1/7

1/6

2/10

2/8

3

2/7

2/6

1/10

2/8

4

1/7

1/6

1/10

2/8

5

2/7

1/6

4/10

1/8

F \ C

w

x

y

z

good

1/4

1/3

4/7

2/5

bad

3/4

2/3

3/7

3/5

C

w

x

y

z

3

2

6

4

C

w

x

y

z

3/15

2/15

6/15

4/15

13 of 15

… classify the following example

A

F

C

2

bad

?

Compute the likelihoods:

P("w") = 1/7 x 3/4 x 3/15 ≈ 0.021

P("x") = 1/6 x 2/3 x 2/15 ≈ 0.015

P("y")= 2/10 x 3/7 x 6/15 ≈ 0.034

P("z") = 2/8 x 3/5 x 4/15 ≈ 0.04

w

x

y

z

0.021

0.015

0.034

0.04

19%

13.6%

30.9%

36.4%

Derive the (normalized) probabilities:

Choose the highest probability and classify the example in class z.

14 of 15

Missing values

  • Naïve Bayes is not affected by missing values – it simply "leaves them out" of the calculations

Classify the new day:

Likelihoods:

P("Yes") = 5/12 x 3/12 x 4/11 x 7/11 x 10/16 ≈ 0.015 0.036

P("No") = 1/8 x 3/8 x 5/7 x 3/7 x 6/16 ≈ 0.0050.043

(Normalized) probabilities:

P("Yes") = 0.036 / (0.036 + 0.043) ≈ 46%

P("No") = 0.043 / (0.036 + 0.043) ≈ 54% 🡪 Play = "No"

?

Hot

High

False

15 of 15

What have you learned?

  • Naïve Bayes