Join at slido.com�#1727335
ⓘ
Click Present with Slido or install our Chrome extension to display joining instructions for participants while presenting.
1727335
Random Variables
Numerical functions of random samples and their properties; sampling variability.
Data 100/Data 200, Fall 2023 @ UC Berkeley
Narges Norouzi and Fernando Pérez
Content credit: Acknowledgments
2
LECTURE 17
1727335
Today’s Roadmap
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Expectation and Variance
Sums of Random Variables
Populations and Samples
3
1727335
Random Variables and Distributions
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Expectation and Variance
Sums of Random Variables
Populations and Samples
4
1727335
The Bias-Variance Tradeoff
5
We’ll come back to this…
What is the mathematical underpinning of this plot?
1727335
1727335
Why Probability?
6
Model Selection Basics:�Cross Validation
Regularization
Question & Problem
Formulation
Data
Acquisition
Exploratory Data Analysis
Prediction and
Inference
Reports, Decisions, and Solutions
?
Probability I:�Random Variables
Estimators
(today)
Probability II:�Bias and Variance
Inference/Multicollinearity
1727335
1727335
We Will See . . .
Formalize the notions of sample statistic, population parameter from Data 8.
7
2. The Central Limit Theorem: If you draw a large random sample with replacement, then, regardless of the population distribution, the probability distribution of the sample mean:
np.mean(data)
From Data 8:
1. def sample mean - the mean of your random sample.
Sample mean
1727335
1727335
We Will See → One Important Probability Concept
Formalize the notions of sample statistic, population parameter from Data 8.
From Data 8:
1. def sample mean - the mean of your sample.
2. The Central Limit Theorem: If you draw a large random sample with replacement, then, regardless of the population distribution, the probability distribution of the sample mean:
8
population SD
Random Variable
We will go over just enough probability to help you understand its implications for modeling.
For more probability, take Data 140, CS 70, and/or EECS 126.
1727335
1727335
[Terminology] Random Variable
Suppose we generate random data, like a random sample from some population, then
A random variable is a numerical function of the randomness in the data.
9
1727335
1727335
Example: Tossing a Coin
10
A fair coin can land either heads (𝐻) or tails (𝑇), each with probability 0.5. With these possible outcomes, we can define a random variable 𝑋:
𝑋 is a function with
We can write this in function notation:
1727335
1727335
Example: Sampling Data 100 Students
11
Y(s) = 2
s
Y(s) = 0
s
Y(s) = 2
s
Y(s) = 1
s
Blue: data science students
Population: all Data 100 students
Draw several random samples:
Suppose we draw a simple random sample s of size 3 from the following population:
We can define Y as the number of data science students in our sample.
1727335
1727335
[Terminology] Distribution
12
For any random variable, we need to be able to specify two things:
The probability that random variable X takes on the value x .
Assuming (for now) that X is discrete, i.e., has�a finite number of possible values:
Probabilities must sum to 1.
We can often do this using a probability distribution table. In the coin toss example, the probability distribution table of 𝑋:
x | P(X = x) |
0 | 1/2 |
1 | 1/2 |
1727335
1727335
[Terminology] Distribution
The distribution of a random variable X is a description of how the total probability of 100% is split over all the possible values of X .
A distribution fully defines a random variable.
13
Probability Distribution Table of Y
1727335
1727335
Distributions Can Be Represented as Histograms or Densities
Take a probability class like Data 140 to learn more about discrete vs continuous distributions.
14
Distribution of discrete random variable X
Distribution of continuous random variable Y
1727335
1727335
Probabilities as Areas
If we sum up the total area of the bars/under the density curve, we should get 100%, or 1.
The area of the red bars is�P(7 <= X <= 9).
The red area under the curve is�P(6.8 <= Y <= 9.5).
15
Distribution of discrete random variable X
Distribution of continuous random variable Y
1727335
1727335
Understanding Random Variables
1. P(X = 4) =
2. P(X < 6) =
3. P(X ≤ 6) =
4. P(X = 7) =
5. P(X ≤ 8) = �
16
Compute the following probabilities for the random variable X.
0.2
0.1 + 0.2 = 0.3
0.1 + 0.2 + 0.4 = 0.7
0
1
1727335
1727335
Common Random Variables
Bernoulli(p)
Binomial(n, p)
Uniform on a finite set of values
Uniform on the unit interval (0, 1)
Normal(𝜇, 𝜎2)
17
The numbers in parentheses are the parameters of a random variable, which are constants. Parameters define a random variable’s shape (i.e., distribution) and its values.
1727335
1727335
From Distribution to (Simulated) Population
18
X(s) from all possible samples?
Probability Distribution Table
?
Given a random variable’s distribution, how could we generate/simulate a population?
1727335
1727335
From Distribution to (Simulated) Population
19
Simulate: Randomly pick values of X according to its distribution
X(s) from many, many (simulated) samples
Probability Distribution Table
… …
np.random.choice or df.sample
1727335
1727335
Expectation and Variance
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Expectation and Variance
Sums of Random Variables
Populations and Samples
20
1727335
Descriptive Properties of Random Variables
There are several ways to describe a random variable:
21
Random Variable X
Table of all samples s , X(s)
Distribution Table
P(X = x)
Histogram
Definition (fully describes)
Properties (summarizes)
The expectation and variance of a random variable are numerical summaries of X.
They are numbers and are not random!
Expectation
the “average value” of X
Variance
the “spread” of X
1727335
1727335
You’ve Seen This Before
The mean (Data 100: expectation) is the center of gravity or balance point of the histogram�(Data 100: of a random variable). [textbook]
The variance is a measure of spread.�It is the expected squared deviation from the mean (Data 100: of a random variable). [textbook]
22
SD
In Data 8, you computed these from the datapoints themselves (i.e., the sample of data).
In Data 100, we redefine these terms with respect to probability distributions.
1727335
1727335
Definition of Expectation
The expectation of a random variable X is the weighted average of the possible values of X, where the weights are the probabilities of the values.
23
More common (we are usually given the distribution, not all possible samples)
Two equivalent ways to apply the weights:
Expectation is a number, not a random variable!
1727335
1727335
Example 1: Tossing a coin
24
A fair coin can land either heads (𝐻) or tails (𝑇), each with probability 0.5. With these possible outcomes, we can define a random variable 𝑋:
The expectation of X does not need to be a possible value of X.
Note, E[X] = 0.5 is not a possible value of X! It is an average.
1727335
1727335
Example 2
Consider the random variable X we defined earlier.
25
The expectation of X does not need to be a possible value of X.
Note, E[X] = 5.9 is not a possible value of X!�It is an average.
1727335
1727335
Definition of Variance
Variance is the expected squared deviation from the expectation of X.
26
Variance is a number, not a random variable!
By Chebyshev’s inequality (which you saw in Data 8, and which we won’t prove here either):
1727335
1727335
Variance: Alternate Calculation
There’s a more convenient form of variance:
27
How do we calculate ?
is just another random variable, with its own probability distribution. We can calculate its expectation as
1727335
1727335
Example: Dice Is the Plural; Die Is the Singular
Let X be the outcome of a single die roll.
X is a random variable.
28
🤔
2. What is the variance, Var(X)?
1. What is the expectation, E[X]?
(definitions/properties)
1727335
1727335
Let X be the outcome of a single fair die roll. What is the expectation of X (E[X])?
ⓘ Start presenting to display the poll results on this slide.
1727335
Example: Dice Is the Plural; Die Is the Singular
Let X be the outcome of a single die roll.
X is a random variable.
30
2. What is the variance, Var(X)?
1. What is the expectation, E[X]?
1727335
1727335
Let X be the outcome of a single fair die roll. What is the variance of X (Var[X])?
ⓘ Start presenting to display the poll results on this slide.
1727335
Example: Dice Is the Plural; Die Is the Singular
Let X be the outcome of a single die roll.
X is a random variable.
32
2. What is the variance, Var(X)?
1. What is the expectation, E[X]?
Approach 1: Definition
Approach 2: Property
= 91/6
1727335
1727335
Sums of Random Variables
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Expectation and Variance
Sums of Random Variables
Populations and Samples
33
1727335
Functions of Multiple Random Variables
A function of a random variable is also a random variable!
If you create multiple random variables based on your sample…
…then functions of those random variables are also random variables.
For instance, if are random variables, then so are all of these:
34
Many functions of RVs that we care about (counts, means) involve sums of RVs, so we expand on properties of sums of RVs.
1727335
1727335
Equal vs. Identically Distributed vs. i.i.d.
Suppose that we have two random variables X and Y.
35
i.i.d. RVs
X and Y are equal if:
X and Y are identically distributed if:
X and Y are independent and identically distributed (i.i.d.) if:
1727335
1727335
Distributions of Sums
Let X1 and X2 be numbers on rolls of two dice.
36
How can we directly compute E[Y], Var(Y), without simulating distributions?
Let’s show this through simulation:
Demo
1727335
1727335
Properties of Expectation [1/3]
Instead of simulating full distributions, we often just compute expectation and variance directly.
Recall definitions of expectation:
Properties:
1. Expectation is linear.�Intuition: summations are linear. Proof
37
1727335
1727335
Properties of Expectation [2/3]
Instead of simulating full distributions, we often just compute expectation and variance directly.
Recall definitions of expectation:
Properties:
1. Expectation is linear.�Intuition: summations are linear. Proof
2. Expectation is linear in sums of RVs,�for any relationship between X and Y. Proof
38
1727335
1727335
Properties of Expectation [3/3]
Instead of simulating full distributions, we often just compute expectation and variance directly.
Recall definitions of expectation:
Properties:
1. Expectation is linear.�Intuition: summations are linear. Proof
2. Expectation is linear in sums of RVs,�for any relationship between X and Y. Proof
3. If g is a non-linear function, then in general .
39
1727335
1727335
Properties of Variance [1/2]
Recall definition of variance:
40
Distribution of X
SD
Distribution of -3X
-3X
3SD
-3X + 2
Distribution of -3X + 2
3SD
Properties:
1. Variance is non-linear:
Shifting doesn’t change variance, scaling does
Intuition (full proof): Consider the�Standard Deviation for Y = -3X+2:
1727335
1727335
Properties of Variance [2/2]
Recall definition of variance:
Properties:
1. Variance is non-linear:
Intuition (full proof): Consider the�Standard Deviation for Y = -3X+2:
2. Variance of sums of RVs is affected by the (in)dependence of the RVs (derivation):
41
Covariance of X and Y (next slide).
If X, Y independent,
then Cov(X, Y) = 0.
Typo is fixed compared to the live lecture
1727335
1727335
Covariance and Correlation: The Basics
Covariance is the expected product of deviations from expectation.
42
X in standard units (link)
Correlation (and therefore covariance) measures a linear relationship between X and Y.
1727335
1727335
Covariance and Correlation: The Basics
Covariance is the expected product of deviations from expectation.
Correlation (and therefore covariance) measures a linear relationship between X and Y.
43
standard units of X (link)
1727335
1727335
Dice, Our Old Friends: Expectation
Let X1 and X2 be numbers on one roll of two dice.
44
E[Y] = E[2X1] = 2E[X1] = 7
E[Z] = E[X1 + X2]= E[X1] + E[X2] = 7
Z = X1 + X2
Y = 2X1
1727335
1727335
Dice, Our Old Friends: Variance
Let X1 and X2 be numbers on two rolls of a die.
45
Z = X1 + X2
E[Y] = E[2X1] = 2E[X1] = 7
Y = 2X1
E[Z] = E[X1 + X2 ]=E[X1] + E[X2] = 7
Var(Y) = Var(2X1) = 4Var(X1)
= 4(35/12)
≈ 11.67
Var(Z) = Var(X1) + Var(X2) + 2Cov(X1, X2)
= (35/12) + (35/12) + 0
≈ 5.83
X1, X2 independent
0
Simulation Results
1727335
1727335
[Summary] Expectation and Variance for Linear Functions of Random Variables
Let X be�a random variable with�distribution P(X = x).
46
(definition)
(easier computation)
Let a and b be�scalar values.
0 if X, Y independent.
Let Y be�another random variable.
1727335
1727335
Bernoulli and Binomial Random Variables
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Expectation and Variance
Sums of Random Variables
Populations and Samples
47
1727335
Common Random Variables
Bernoulli(p)
Binomial(n, p)
Uniform on a finite set of values
Uniform on the unit interval(0, 1)
Normal(𝜇, 𝜎2)
48
We’ll now revisit these to solidify our understanding of expectation/variance.
1727335
1727335
Properties of Bernoulli Random Variables
Let X be a Bernoulli(p) random variable.
49
We will get an average value of p across many, many samples
Lower Var: p = 0.1 or 0.9
Higher Var: p close to 0.5
More info: google(“plot x(1 - x)”)
Definitions
Expectation:
Variance:
1727335
1727335
Properties of Binomial Random Variables
Let Y be a Binomial(n, p) random variable.
50
A count is a sum of 0’s and 1’s.
We can write:
1727335
1727335
Properties of Binomial Random Variables
Let Y be a Binomial(n, p) random variable.
51
Expectation:
We can write:
A count is a sum of 0’s and 1’s.
Variance: Because all Xi s are independent, Cov(Xi , Xj ) = 0 for all i, j.
1727335
1727335
Binomial(n, p) for Large n
For p = 0.5, n = 50 (i.e. number of heads in 50 fair coin flips):
52
Y_50
Hallgrímskirkja in Iceland
Covered until here on 10/19
1727335
1727335
Populations and Samples
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Expectation and Variance
Sums of Random Variables
Populations and Samples
53
1727335
From Populations to Samples
Today, we’ve talked extensively about populations:
However, in Data Science, we often collect samples.
54
Population�(Sampling Frame)
Sample uniformly at random with replacement
Sample�size n
The big assumption we make in modeling/inference:
1727335
1727335
The Sample is a Set of i.i.d. Random Variables
55
Population�(Sampling Frame)
Sample�size n
Sample uniformly at random with replacement
Each observation in our sample is a Random Variable drawn i.i.d. from our population distribution.
X1 , X2 , …, Xn
1727335
1727335
The Sample is a Set of i.i.d. Random Variables
56
Population�(Sampling Frame)
Sample�size n
… …
df.sample(n, replace=True)
Population Mean
A number,�i.e., fixed value
Sample Mean
A random variable!
Depends on our randomly drawn sample!!
np.mean(...) = 5.71
E[X]= 5.9
X1 , X2 , …, Xn
Sample
or
Sample uniformly at random with replacement
1727335
1727335
[Terminology] Sample Mean
Consider an i.i.d. sample X1 , X2 , …, Xn drawn from a population with mean 𝜇 and SD 𝜎.
Define the sample mean:
57
is normally distributed by the Central Limit Theorem.
Distribution?
Expectation:
Variance/Standard Deviation:
IID → Cov(Xi, Xj) = 0
1727335
1727335
Central Limit Theorem
No matter what population you are drawing from:
If an i.i.d. sample of size n is large,
the probability distribution of the sample mean
is roughly normal with mean 𝜇 and SD .
58
(previous slide)
(Out of scope)
𝜇
Any theorem that provides the rough distribution of a statistic�and doesn’t need the distribution of the population is valuable to data scientists.
For a more in-depth demo: https://onlinestatbook.com/stat_sim/sampling_dist/
1727335
1727335
How Large Is “Large”?
No matter what population you are drawing from:
If an i.i.d. sample of size n is large,
the probability distribution of the sample mean
is roughly normal with mean 𝜇 and SD .
59
How large does n have to be for the normal approximation to be good?
1727335
1727335
Using the Sample Mean to Estimate the Population Mean
Our goal is often to estimate some characteristic of a population.
60
n = 800
n = 200
We should consider the average value and spread of all possible sample means, and what this means for how big n should be.
1727335
1727335
Using the Sample Mean to Estimate the Population Mean
Our goal is often to estimate some characteristic of a population.
61
n = 800
n = 200
For every sample size, the expected value of the sample mean is the population mean.
We call the sample mean an�unbiased estimator of the population mean.�(more in next lecture)
We should consider the average value and spread of all possible sample means, and what this means for how big n should be.
1727335
1727335
Using the Sample Mean to Estimate the Population Mean
Our goal is often to estimate some characteristic of a population.
62
n = 800
n = 200
For every sample size, the expected value of the sample mean is the population mean.
We call the sample mean an�unbiased estimator of the population mean.�(more in next lecture)
Square root law (Data 8): If you increase the sample size by a factor, the SD decreases by the square root of the factor.
The sample mean is more likely to be close to the population mean if we have a larger sample size.
We should consider the average value and spread of all possible sample means, and what this means for how big n should be.
1727335
1727335
Have a Normal Day!
63
1727335
1727335
[Extra Slides] Derivations
Lecture 17, Data 100 Fall 2023
Random Variables and Distributions
Functions of Random Variables
Sample Statistics
64
1727335
Standardization of Random Variables
X in standard units is the random variable .
Xsu measures X on the scale “number of SDs from expectation.”
You should prove these facts yourself.
65
Jump back: link
1727335
1727335
Variance: An Alternate Derivation
There’s a more convenient form of variance for use in calculations.
To derive this, we make repeated use of the linearity of expectation.
66
Jump back: link
1727335
1727335
Properties of Expectation #1
Recall definition of expectation:
1. Expectation is linear:
(intuition: summations are linear)
Proof:
67
Jump back: link
1727335
1727335
Properties of Expectation #2
Recall definitions of expectation:
3. Expectation is linear in sums of RVs:
For any relationship between X and Y.
Proof:
68
Jump back: link
1727335
1727335
Properties of Variance #1
We know that .
In order to compute Var(aX + b), consider:
Then,
In summary:
69
Jump back: link
Don’t forget the absolute values and squares!
1727335
1727335
Properties of Variance #2
The variance of a sum is affected by the dependence between the two random variables that are being added. Let’s expand out the definition of Var(X + Y) to see what’s going on. Let .
We see that the variance of a sum is equal to the sum of variances, PLUS this weird term...
70
By the linearity of expectation, and the substitution.
Jump back: link
1727335
1727335
Addition Rule for Variance
If X and Y are uncorrelated (in particular, if they are independent), then
Therefore, under the same conditions,
71
1727335
1727335
Random Variables
Content credit: Acknowledgments
72
LECTURE 17
1727335