2 of 26

Probability vs. Logic

Language	What exists in the world?	What an agent believes about facts?
Propositional Logic	Facts	True / False / Unknown
First-order logic	Facts, Objects, Relations	True / False / Unknown
Probability theory	Facts	Degree of belief ∈ [0, 1] *

* meaning a number between 0 and 1

3 of 26

Sample Space Ω

Coin Flip 1	Coin Flip 2
H	H
H	T
T	H
T	T

Sample Space Ω (uppercase omega) is the set of all possible worlds

Ω = {HH, HT, TH, TT}

4 of 26

Sample Space Ω and ω

Possible world	Coin Flip 1	Coin Flip 2
ω₁	H	H
ω₂	H	T
ω₃	T	H
ω₄	T	T

Sample Space Ω (uppercase omega) is the set of all possible worlds

Ω = {HH, HT, TH, TT}

Ω = {ω₁,ω₂, ω₃,ω₄} ω (lowercase omega) refers to a particular possible world

5 of 26

Probability Model P(ω)

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

Basic axioms of probability theory:

0 ≤ P(ω_i) ≤ 1 for every ω_i

∑ _ω_∈_Ω P(ω_i) = 1

A Probability Model associates a numerical probability P(ω) with each possible world

6 of 26

Probability Distribution as a Pie Chart

P(ω₁)

P(ω₃)

P(ω₂)

P(ω₄)

P(ω₅)

Basic axioms of probability theory:

For every ω_i, 0 ≤ P(ω_i) ≤ 1

0: 0% non existent / impossible

1: 100 % complete monopoly / certain

∑ _ω_∈_Ω P(ω_i) = 1

All possibilities should add up to 1 or 100%

7 of 26

Probability Distribution as a Histogram

0.75

0.5

0.25

HH HT TH TT

ω₁ω₂ω₃ω₄

P(ω_i)

8 of 26

Events ɸ

An Event ɸ is a set of possible worlds {ω_i,ω_j,... ω_n}

An event ɸ is a subset of Ω

For example, Coin Flip 1 == Coin Flip 2 is an event ɸ = {ω_1,ω₄}

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

9 of 26

Events ɸ

An Event ɸ is a set of possible worlds {ω_i,ω_j,... ω_n}

An event ɸ is a subset of Ω

Another example of an event is at least one Heads ɸ = {ω₁, ω₂, ω₃}

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

10 of 26

Probability of an Event P(ɸ)

P (ɸ) = ∑ _ω_∈_ɸ P (ω) is the sum of probabilities of the set of possible worlds defining ɸ

P (ɸ₁) = P(At least one Heads) = P(ω₁)+P(ω₂)+ P(ω₃) = 0.25 + 0.25 + 0.25 = 0.75

P (ɸ₂) = P(Coin Flip 1 == Coin Flip 2) = P(ω₁)+P(ω₄) =0.25 + 0.25 = 0.5

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

11 of 26

Random Variables

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

CoinFlip1 and CoinFlip2 here are Random Variables

Range of a random variable is the set of possible values it can take on e.g. {H, T}

Random Variables can be Discrete (e.g. Coin Flip or Roll of a Dice etc. ) or Continuous (e.g. Temperature, Weight etc.)

12 of 26

Conditional Probability P(a|b)

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

Probability of CoinFlip2 = H, given CoinFlip1 = T

P( CoinFlip2 = H | CoinFlip1 = T ) =

P(CoinFlip2=H ∧ CoinFlip1=T)
P(CoinFlip1=T)

0.25
0.5

13 of 26

Conditional Probability and Product Rule

General formula for conditional probability:

P(X=x₁ | Y=y₁) =

Rearranging equation, we get the Product Rule:

P(X=x₁ ∧ Y=y₁) = P(X=x₁ | Y=y₁) * P(Y = y₁)

P(X=x₁ ∧ Y=y₁)
P(Y = y₁)

14 of 26

Conditional Probability P(a|b)

P(X=x₁)

P(Y=y₁)

P(X=x₁ | Y=y₁) = =

P(X=x1 ∧ Y=y1)
P(Y = y1)

15 of 26

Inclusion-Exclusion Principle P(a v b)

Possible world	Coin Flip 1	Coin Flip 2	P( ω_i)
ω₁	H	H	0.25
ω₂	H	T	0.25
ω₃	T	H	0.25
ω₄	T	T	0.25

P (a v b) = P(a) + P(b) - P (a ∧ b)

For example,

P(CoinFlip1=H v CoinFlip2=T) = P(CoinFlip2=H) + P(CoinFlip1=T) - P(CoinFlip2=H ∧ CoinFlip1=T)

16 of 26

Inclusion-Exclusion Principle P(a v b)

P(X=x1)

P(Y=y1)

P(X=x₁ v Y=y₁) = P(X=x₁) + P(Y=y₁) - P(X=x₁ ∧ Y=y₁) = + -

17 of 26

Librarian or Farmer?

Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.

Is Steve more likely to be a librarian or a farmer?

Librarian = 3

Farmer = 13

18 of 26

Librarian or Farmer?

What percentage of the general population are Librarians?

What percentage of the general population are Farmers?

19 of 26

20 : 1 ratio

In the US, the ratio is ~ 20 : 1

For every 1 librarian, there exist 20 farmers in the general population

20 of 26

To simplify the math, we have 10 librarians and 200 farmers here

Preserving

the 20:1 ratio

21 of 26

“Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”

Let’s say 40% of librarians resemble Steve’s description

Let’s say only 10% of farmers meet the description provided for Steve

22 of 26

Let’s say 40% of librarians resemble Steve’s description

Let’s say only 10% of farmers meet the description provided for Steve

P(Librarian | Description) =

P(Farmer | Description) =

4
4 + 20

20
4 + 20

23 of 26

Bayes theorem

You have some Hypothesis:

Steve is a librarian

You have some Evidence:

Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.

Probability ( Hypothesis | Evidence )

e.g.

P(Librarian | Description)

24 of 26

P(Librarian | Description)

P(Librarian) = 1/21

This is called Prior

P(Description | Librarian) = 4/10

This is called Likelihood

P(Description | ¬ Librarian) = 20/200 = 0.1

P(Librarian | Description) =

25 of 26

P(Librarian | Description)

P(Librarian | Description) =

P(Librarian | Description) =

P(Librarian) * P(Librarian |Description)

[P(Librarian) * P(Description |Librarian)]

[P(¬Librarian)*P(Description|¬Librarian)]

26 of 26

Bayes Theorem

P (Hypothesis | Evidence) = Prior * Likelihood

Evidence

P(Hypothesis | Evidence) = P(Hypothesis) * P(Evidence | Hypothesis)

P(Evidence)

P(A | B) = P (A) * P(B | A)

P(B)