Basic Probability and Distributions ( Part 1)
CMSC 320 - Introduction to Data Science
Fardina Alam
Topics we will cover:
Chapter 6 in : https://ffalam.github.io/CMSC320TextBook/
Probability Theory:
Probability Distribution
CLT Theorem
2
Why Probability Matters in Data Science
A method for decision making in the presence of uncertainty.
Probability is a mindset, not just math
In Data Science, we often make predictions amidst uncertainty.
Probability helps us make clear decisions under uncertainty.
Basic Probability formula: The probability P of an event A happening:x
Classic Example: What is the chances of rolling a one (Number 1) on the dice?
There are six different outcomes. (sample space)
Probabilities are between 0 and 1�
Probability Theory and Data Science
4
Applications of Probability in Data Science |
|
Key Concepts in Probability
Example: Coin flip → Ω = {H, T}, Event = Heads
From Events to Data Science
Focus: patterns, not single outcomes
Probability Distributions: Functions that describe the likelihood of different outcomes. Helps in understanding the spread of data.
Example: Random Variable and Probability Distributions
A random variable represents the outcomes of a random event.
6
E.g: Imagine rolling two dice.
Let X = sum of two dice
Possible values: 2–12
Each value has a probability
P(X=2)= 1/36
P(X=3)= 2/36 (rolling a 1 and a 2 or rolling a 2 and a 1)
P(X=4)= 3/36
P(X=5)= 4/36
P(X=6)= 5/36
P(X=7)= 6/36
P(X=8)= 5/36
P(X=9)= 4/36
P(X=10)= 3/36
P(X=11)= 2/36
P(X=12)= 1/36
Probability Distributions: Describe the likelihood of different outcomes occurring.
Conditional Probability
Probability of A given B
Idea: “What is the chance of A once B is known?”Remember: Most real-world data is conditional.
Example: A bag has 5 red and 5 blue marbles.� Among the red marbles, 3 are shiny.
What is the probability of picking a shiny marble given that it is red?
P (A|B)?
= (3/10) / (5/10) = ⅗ = 0.6 = 60%
= 3/10
= 5/10
Bayes Rule
Bayes' Rule Used to update probabilities when new information is available.
Example:
In spam detection:
Bayes' Rule updates our belief on whether the email is spam based on the word "discount."
Example: Picnic Day
9
Scenario: You want to find the chance of rain during the day given that the morning is cloudy. You have the following probabilities:
What is the chance of rain during the day?
Now, use Bayes' Theorem:
Or a 12.5% chance of rain. Not too bad, let's have a picnic!
Law of Total Probability
Sometimes events can happen in multiple ways.
“The Law of Total Probability’ computes the probability of an event by considering all possible scenarios that cover the event.”
Where Bi (B2, B2….Bn) are mutually exclusive and exhaustive events.
Takeaway: Helps compute probabilities when causes are hidden.
Example (Dice):
P(6) = (1/6)(0.7) + (1/2)(0.3) = 0.283
** Mutually exclusive & exhaustive: Events that don’t overlap and cover all possibilities.
Example
Example (Two Bags of Marbles):
Case 1: Equal Bag Selection Now suppose I put the two bags in a box. If I close my eyes, grab a bag from the box, and then grab a marble from the bag, what is the probability that it is black? | Case 2: Unequal Bag Selection (Bag 1 twice as likely): now suppose that the first bag is much larger than the second bag. so that when I reach into the box I’m twice as likely to grab the first bag as the second. What is the probability of grabbing a black marble? |
SOLUTION: P(B1)=1/2, P(B2)=½ P(Black)= P(Black)=1/2·4/10 + 1/2·7/10 = 11/20 = 0.55 | SOLUTION: P(B1)=2/3, P(B2)=1/3 P(Black)=2/3·4/10 + 1/3·7/10 = 15/30 = 1/2 |
Independence
Two events A and B are independent if knowing B does not change the probability of A.
Ex:
Conditional Independence
Two variables A and B are conditionally independent given C if knowing B adds no new information about A once C is known.
Equivalently:
Example A: has a cough, B: has a fever, C: has a cold
Idea: Once we know the person has a cold, cough and fever behave independently.
Why It Matters
Three probabilities:
P(Rain), P(Dog Barks), P(Cat Runs)
Rain and Cat Runs are not independent.
Rain
Dog Barks
Cat Runs
What if you already know the dog is barking 🐶?
No!
→ No longer influenced
by Rain!
Conditional Independence Example
What if you already know the dog is barking?
Rain and Cat are thus conditionally independent given that Dog Barks
Rain
Dog Barks
Cat Runs
Does not depend on rain!
Consider three variables A, B, and C, If the distribution of A given B and C, is such that it does not depend on the value of B then
P(A | B ∩ C) = P(A | C) → no B! (A is conditionally independent of B given C)
Expected Value (Average Behavior)
Expected value summarizes long-run behavior.
In Data Science:
Remember: Expected value drives loss, reward, and optimization.
Someone offers for you to go on a game show. On this gameshow, there is:
Your feelings about being hit with sticks aside, should you go on the game show?
Expected value of the game show:
(1,000,000 * .05) + (-10,000 * .95) = 40500
Net positive!
16
Expected Value: Example
You are playing a game where you spin a wheel. The wheel has the following sets of rewards:
On average, you can expect to win $12 per spin over a large number of spins.
EV=(.2 * 0) + (.3 * 10) + (.4 * 20) + (.1 * 10) = 0+3+8+1 =12
Probability Distributions and their types
Distributions describe how values are spread.
Shape tells us about the data-generating process�Remember: Distribution choice comes before modeling.
Understanding how your data is distributed can tell you a lot about the process generating the data.
1(a,b). Bernoulli and Binomial Distribution
Binary outcome: Success (1) or Failure (0)
Bernoulli Distribution
P(X = 1) = p
P(X = 0) = 1 − p
Example: one coin flip, spam vs not spam
Binomial Distribution Number of successes (k) in n independent Bernoulli trials�
X ~ Binomial(n, p)
Example: heads in 10 flips, correct answers on a quiz
Key idea: Binomial = sum of independent Bernoulli trials
Example: Tossing a coin 10 times:
1a. Poisson Distribution
What it models: The number of times an event occurs in a fixed time or space, when events happen randomly at a constant average rate.
Assumptions
P(X =k)
Where,
We can say, X follows a Poisson distribution with parameter λ
Example
Emails arrive at an average rate of 3 per hour. What is the probability of receiving exactly 2 emails in one hour?
Shape: Skewed right for small λ, becomes more symmetric as λ increases.
Zero-Inflated Poisson (ZIP) Distribution
Used for count data with more zeros than expected.
Components
Why ZIP?
Examples
Oftentimes, poisson distributions can have a spike at zero.
2a. The Uniform Distribution
Uniform Distribution
Example:� Rolling a fair six-sided die
21
2b. Normal (Gaussian) Distribution
Bell-shaped and symmetric
Data is symmetrically distributed with no skew
It is symmetric around its mean and is defined by two parameters: the mean (μ) and the standard deviation (σ).
Example: Heights of people, measurement errors.
Averages are common, extremes are rare.
Parameters of the Normal Distribution: μ & σ
Mean (μ):
Standard Deviation (σ):
Problem Solving: Finding percentages
Example 1: The time taken to travel between two regional cities is approximately normally distributed with a mean of 70 minutes and a standard deviations of two minutes
Q: What is the percentage of travel times that are between 66 minutes and 72 minutes?
24
70
72
74
76
68
66
64
time(min)
%= 13.5+34+34= 81.5
TRY YOURSELF: The volume of a cup of soup serve via machine is normally distributed with a mean of 240 mL and a standard deviation of 5 mL. A fast store used this machine to serve 160 cups of soup
Ques: What number of these cups of soup are expected to contain less than 230 ml?
Standard Normal (Z) Distribution
A special normal distribution with:
Used to compare values from different distributions by converting them to z-scores.
Z-Scores: Any normal distribution can be transformed into a standard normal distribution using:
Interpretation
Example Given: Mean μ = 100, Standard deviation σ = 15, Student score X = 115�Z = (115 − 100) / 15 = 1
Meaning: The score is 1 standard deviation above the mean, indicating better-than-average performance.
The Central Limit Theorem (CLT):
For a sufficiently large sample size nnn, the distribution of the sample mean is approximately normal, regardless of the population’s original distribution
If we sample a distribution a bunch of times, the set of sample means is normally distributed.
Key ideas: Applies to sample means, not individual values