1 of 26

CSC-343

Artificial Intelligence

Lecture 5.1.

2 of 26

Probability vs. Logic

Language

What exists in the world?

What an agent believes about facts?

Propositional Logic

Facts

True / False / Unknown

First-order logic

Facts, Objects, Relations

True / False / Unknown

Probability theory

Facts

Degree of belief [0, 1] *

* meaning a number between 0 and 1

3 of 26

Sample Space Ω

Coin Flip 1

Coin Flip 2

H

H

H

T

T

H

T

T

  • Sample Space Ω (uppercase omega) is the set of all possible worlds

Ω = {HH, HT, TH, TT}

4 of 26

Sample Space Ω and ω

Possible world

Coin Flip 1

Coin Flip 2

ω1

H

H

ω2

H

T

ω3

T

H

ω4

T

T

  • Sample Space Ω (uppercase omega) is the set of all possible worlds

Ω = {HH, HT, TH, TT}

Ω = {ω1, ω2, ω3 4} ω (lowercase omega) refers to a particular possible world

5 of 26

Probability Model P(ω)

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

  • Basic axioms of probability theory:

0 ≤ P(ωi) ≤ 1 for every ωi

ω Ω P(ωi) = 1

  • A Probability Model associates a numerical probability P(ω) with each possible world

6 of 26

Probability Distribution as a Pie Chart

P(ω1)

P(ω3)

P(ω2)

P(ω4)

P(ω5)

  • Basic axioms of probability theory:

  • For every ωi , 0 ≤ P(ωi) ≤ 1

0: 0% non existent / impossible

1: 100 % complete monopoly / certain

  • ω Ω P(ωi) = 1

All possibilities should add up to 1 or 100%

7 of 26

Probability Distribution as a Histogram

1

0.75

0.5

0.25

0

HH HT TH TT

ω1 ω2 ω3 ω4

P(ωi)

8 of 26

Events ɸ

  • An Event ɸ is a set of possible worlds {ωi, ωj, ... ωn}

  • An event ɸ is a subset of Ω

  • For example, Coin Flip 1 == Coin Flip 2 is an event ɸ = {ω1, ω4}

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

9 of 26

Events ɸ

  • An Event ɸ is a set of possible worlds {ωi, ωj, ... ωn}

  • An event ɸ is a subset of Ω

  • Another example of an event is at least one Heads ɸ = {ω1 , ω2 , ω3}

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

10 of 26

Probability of an Event P(ɸ)

  • P (ɸ) = ω ɸ P (ω) is the sum of probabilities of the set of possible worlds defining ɸ

  • P (ɸ1) = P(At least one Heads) = P(ω1) +P(ω2) + P(ω3) = 0.25 + 0.25 + 0.25 = 0.75

  • P (ɸ2) = P(Coin Flip 1 == Coin Flip 2) = P(ω1) +P(ω4) = 0.25 + 0.25 = 0.5

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

11 of 26

Random Variables

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

  • CoinFlip1 and CoinFlip2 here are Random Variables

  • Range of a random variable is the set of possible values it can take on e.g. {H, T}

  • Random Variables can be Discrete (e.g. Coin Flip or Roll of a Dice etc. ) or Continuous (e.g. Temperature, Weight etc.)

12 of 26

Conditional Probability P(a|b)

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

  • Probability of CoinFlip2 = H, given CoinFlip1 = T

  • P( CoinFlip2 = H | CoinFlip1 = T ) =

P(CoinFlip2=H CoinFlip1=T)

P(CoinFlip1=T)

=

0.25

0.5

13 of 26

Conditional Probability and Product Rule

  • General formula for conditional probability:

P(X=x1 | Y=y1) =

  • Rearranging equation, we get the Product Rule:

P(X=x1 Y=y1) = P(X=x1 | Y=y1) * P(Y = y1)

P(X=x1 Y=y1)

P(Y = y1)

14 of 26

Conditional Probability P(a|b)

P(X=x1)

P(Y=y1)

  • P(X=x1 | Y=y1) = =

P(X=x1 Y=y1)

P(Y = y1)

15 of 26

Inclusion-Exclusion Principle P(a v b)

Possible world

Coin Flip 1

Coin Flip 2

P( ωi )

ω1

H

H

0.25

ω2

H

T

0.25

ω3

T

H

0.25

ω4

T

T

0.25

  • P (a v b) = P(a) + P(b) - P (a ∧ b)

  • For example,

P(CoinFlip1=H v CoinFlip2=T) = P(CoinFlip2=H) + P(CoinFlip1=T) - P(CoinFlip2=H CoinFlip1=T)

16 of 26

Inclusion-Exclusion Principle P(a v b)

P(X=x1)

P(Y=y1)

P(X=x1 v Y=y1) = P(X=x1) + P(Y=y1) - P(X=x1 Y=y1) = + -

17 of 26

Librarian or Farmer?

Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.

Is Steve more likely to be a librarian or a farmer?

Librarian = 3

Farmer = 13

18 of 26

Librarian or Farmer?

  • What percentage of the general population are Librarians?

  • What percentage of the general population are Farmers?

19 of 26

20 : 1 ratio

  • In the US, the ratio is ~ 20 : 1

  • For every 1 librarian, there exist 20 farmers in the general population

20 of 26

  • To simplify the math, we have 10 librarians and 200 farmers here

    • Preserving

the 20:1 ratio

21 of 26

“Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”

  • Let’s say 40% of librarians resemble Steve’s description

  • Let’s say only 10% of farmers meet the description provided for Steve

22 of 26

  • Let’s say 40% of librarians resemble Steve’s description

  • Let’s say only 10% of farmers meet the description provided for Steve

  • P(Librarian | Description) =

  • P(Farmer | Description) =

4

4 + 20

20

4 + 20

23 of 26

Bayes theorem

  • You have some Hypothesis:

    • Steve is a librarian

  • You have some Evidence:

    • Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.

  • Probability ( Hypothesis | Evidence )

e.g.

P(Librarian | Description)

24 of 26

P(Librarian | Description)

  • P(Librarian) = 1/21

This is called Prior

  • P(Description | Librarian) = 4/10

This is called Likelihood

  • P(Description | ¬ Librarian) = 20/200 = 0.1

  • P(Librarian | Description) =

+

25 of 26

P(Librarian | Description)

  • P(Librarian | Description) =

  • P(Librarian | Description) =

P(Librarian) * P(Librarian |Description)

[P(Librarian) * P(Description |Librarian)]

+

[P(¬Librarian)*P(Description|¬Librarian)]

+

26 of 26

Bayes Theorem

  • P (Hypothesis | Evidence) = Prior * Likelihood

Evidence

  • P(Hypothesis | Evidence) = P(Hypothesis) * P(Evidence | Hypothesis)

P(Evidence)

  • P(A | B) = P (A) * P(B | A)

P(B)