1 of 72

Join at slido.com�#1727335

Click Present with Slido or install our Chrome extension to display joining instructions for participants while presenting.

1727335

2 of 72

Random Variables

Numerical functions of random samples and their properties; sampling variability.

Data 100/Data 200, Fall 2023 @ UC Berkeley

Narges Norouzi and Fernando Pérez

Content credit: Acknowledgments

2

LECTURE 17

1727335

3 of 72

Today’s Roadmap

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

Expectation and Variance

Sums of Random Variables

  • Example: Bernoulli and Binomial Random Variables

Populations and Samples

3

1727335

4 of 72

Random Variables and Distributions

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

Expectation and Variance

Sums of Random Variables

  • Example: Bernoulli and Binomial Random Variables

Populations and Samples

4

1727335

5 of 72

The Bias-Variance Tradeoff

5

We’ll come back to this…

What is the mathematical underpinning of this plot?

1727335

1727335

6 of 72

Why Probability?

6

Model Selection Basics:�Cross Validation

Regularization

Question & Problem

Formulation

Data

Acquisition

Exploratory Data Analysis

Prediction and

Inference

Reports, Decisions, and Solutions

?

Probability I:�Random Variables

Estimators

(today)

Probability II:�Bias and Variance

Inference/Multicollinearity

1727335

1727335

7 of 72

We Will See . . .

Formalize the notions of sample statistic, population parameter from Data 8.

7

2. The Central Limit Theorem: If you draw a large random sample with replacement, then, regardless of the population distribution, the probability distribution of the sample mean:

  • Is roughly normal
  • Is centered at the population mean
  • Has an SD =

np.mean(data)

From Data 8:

1. def sample mean - the mean of your random sample.

Sample mean

1727335

1727335

8 of 72

We Will See → One Important Probability Concept

Formalize the notions of sample statistic, population parameter from Data 8.

From Data 8:

1. def sample mean - the mean of your sample.

2. The Central Limit Theorem: If you draw a large random sample with replacement, then, regardless of the population distribution, the probability distribution of the sample mean:

  • Is roughly normal
  • Is centered at the population mean
  • Has an SD =

8

population SD

Random Variable

We will go over just enough probability to help you understand its implications for modeling.

For more probability, take Data 140, CS 70, and/or EECS 126.

1727335

1727335

9 of 72

[Terminology] Random Variable

Suppose we generate random data, like a random sample from some population, then

A random variable is a numerical function of the randomness in the data.

9

  • Domain (input): all possible (random) outcomes in a sample space (random processes).
  • Range (output): number line.
  • Often denoted with uppercase “variable-like” letters (e.g. X, Y ).
  • It is random since our sample was drawn at random.
  • It is a variable since because its value depends on how the sample came out.

1727335

1727335

10 of 72

Example: Tossing a Coin

10

A fair coin can land either heads (𝐻) or tails (𝑇), each with probability 0.5. With these possible outcomes, we can define a random variable 𝑋:

𝑋 is a function with

  • Domain (input): {𝐻, 𝑇}
  • Range (output): {1, 0}

We can write this in function notation:

1727335

1727335

11 of 72

Example: Sampling Data 100 Students

11

Y(s) = 2

s

Y(s) = 0

s

Y(s) = 2

s

Y(s) = 1

s

Blue: data science students

Population: all Data 100 students

Draw several random samples:

Suppose we draw a simple random sample s of size 3 from the following population:

We can define Y as the number of data science students in our sample.

  • Domain (input): all possible samples of size 3
  • Range (output): {0, 1, 2, 3}

1727335

1727335

12 of 72

[Terminology] Distribution

12

For any random variable, we need to be able to specify two things:

  • Possible values: the set of values the random variable can take on
  • Probabilities: the set of probabilities describing how the total probability of 100% is split over the possible values.

The probability that random variable X takes on the value x .

Assuming (for now) that X is discrete, i.e., has�a finite number of possible values:

Probabilities must sum to 1.

We can often do this using a probability distribution table. In the coin toss example, the probability distribution table of 𝑋:

x

P(X = x)

0

1/2

1

1/2

1727335

1727335

13 of 72

[Terminology] Distribution

The distribution of a random variable X is a description of how the total probability of 100% is split over all the possible values of X .

A distribution fully defines a random variable.

13

Probability Distribution Table of Y

1727335

1727335

14 of 72

Distributions Can Be Represented as Histograms or Densities

Take a probability class like Data 140 to learn more about discrete vs continuous distributions.

14

Distribution of discrete random variable X

Distribution of continuous random variable Y

1727335

1727335

15 of 72

Probabilities as Areas

If we sum up the total area of the bars/under the density curve, we should get 100%, or 1.

The area of the red bars is�P(7 <= X <= 9).

The red area under the curve is�P(6.8 <= Y <= 9.5).

15

Distribution of discrete random variable X

Distribution of continuous random variable Y

1727335

1727335

16 of 72

Understanding Random Variables

1. P(X = 4) =

2. P(X < 6) =

3. P(X ≤ 6) =

4. P(X = 7) =

5. P(X ≤ 8) =

16

Compute the following probabilities for the random variable X.

0.2

0.1 + 0.2 = 0.3

0.1 + 0.2 + 0.4 = 0.7

0

1

1727335

1727335

17 of 72

Common Random Variables

Bernoulli(p)

  • Takes on value 1 with probability p, and 0 with probability 1 - p.
  • AKA the “indicator” random variable.

Binomial(n, p)

  • Number of 1s in 'n' independent Bernoulli(p) trials.

Uniform on a finite set of values

  • Probability of each value is 1 / (number of possible values).
  • For example, a standard/fair die.

Uniform on the unit interval (0, 1)

  • Density is flat at 1 on (0, 1) and 0 elsewhere.

Normal(𝜇, 𝜎2)

17

The numbers in parentheses are the parameters of a random variable, which are constants. Parameters define a random variable’s shape (i.e., distribution) and its values.

1727335

1727335

18 of 72

From Distribution to (Simulated) Population

18

X(s) from all possible samples?

Probability Distribution Table

?

Given a random variable’s distribution, how could we generate/simulate a population?

1727335

1727335

19 of 72

From Distribution to (Simulated) Population

19

Simulate: Randomly pick values of X according to its distribution

X(s) from many, many (simulated) samples

Probability Distribution Table

… …

np.random.choice or df.sample

1727335

1727335

20 of 72

Expectation and Variance

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

Expectation and Variance

Sums of Random Variables

  • Example: Bernoulli and Binomial Random Variables

Populations and Samples

20

1727335

21 of 72

Descriptive Properties of Random Variables

There are several ways to describe a random variable:

21

Random Variable X

Table of all samples s , X(s)

Distribution Table

P(X = x)

Histogram

Definition (fully describes)

Properties (summarizes)

The expectation and variance of a random variable are numerical summaries of X.

They are numbers and are not random!

Expectation

the “average value” of X

Variance

the “spread” of X

1727335

1727335

22 of 72

You’ve Seen This Before

The mean (Data 100: expectation) is the center of gravity or balance point of the histogram�(Data 100: of a random variable). [textbook]

The variance is a measure of spread.�It is the expected squared deviation from the mean (Data 100: of a random variable). [textbook]

22

SD

In Data 8, you computed these from the datapoints themselves (i.e., the sample of data).

In Data 100, we redefine these terms with respect to probability distributions.

1727335

1727335

23 of 72

Definition of Expectation

The expectation of a random variable X is the weighted average of the possible values of X, where the weights are the probabilities of the values.

23

More common (we are usually given the distribution, not all possible samples)

Two equivalent ways to apply the weights:

  1. One sample at a time:

  • One possible value at a time:

Expectation is a number, not a random variable!

  • It is a generalization of the average (same units as the random variable).
  • It is the center of gravity of the probability distribution histogram.
  • If we simulate the variable many times, it is the long run average of the random variable

1727335

1727335

24 of 72

Example 1: Tossing a coin

24

A fair coin can land either heads (𝐻) or tails (𝑇), each with probability 0.5. With these possible outcomes, we can define a random variable 𝑋:

The expectation of X does not need to be a possible value of X.

Note, E[X] = 0.5 is not a possible value of X! It is an average.

1727335

1727335

25 of 72

Example 2

Consider the random variable X we defined earlier.

25

The expectation of X does not need to be a possible value of X.

Note, E[X] = 5.9 is not a possible value of X!�It is an average.

1727335

1727335

26 of 72

Definition of Variance

Variance is the expected squared deviation from the expectation of X.

  • The units of the variance are the square of the units of X.
  • To get back to the right scale, use the standard deviation of X:

26

Variance is a number, not a random variable!

  • The main use of variance is to quantify chance error. How far away from the expectation could X be, just by chance?

By Chebyshev’s inequality (which you saw in Data 8, and which we won’t prove here either):

  • No matter what the shape of the distribution of X is, the vast majority of the probability lies in the interval “expectation plus or minus a few SDs.”

1727335

1727335

27 of 72

Variance: Alternate Calculation

There’s a more convenient form of variance:

27

  • Called the “computation formula” for variance
  • Proof (involves expanding the square and properties of expectation/summations): link
  • Useful in Mean Squared Error calculations
    • If X is centered (i.e. E[X] = 0), then E[X2] = Var(X)
  • When computing variance by hand, often used instead of definition.

How do we calculate ?

is just another random variable, with its own probability distribution. We can calculate its expectation as

1727335

1727335

28 of 72

Example: Dice Is the Plural; Die Is the Singular

Let X be the outcome of a single die roll.

X is a random variable.

28

🤔

2. What is the variance, Var(X)?

1. What is the expectation, E[X]?

(definitions/properties)

1727335

1727335

29 of 72

Let X be the outcome of a single fair die roll. What is the expectation of X (E[X])?

Start presenting to display the poll results on this slide.

1727335

30 of 72

Example: Dice Is the Plural; Die Is the Singular

Let X be the outcome of a single die roll.

X is a random variable.

30

2. What is the variance, Var(X)?

1. What is the expectation, E[X]?

1727335

1727335

31 of 72

Let X be the outcome of a single fair die roll. What is the variance of X (Var[X])?

Start presenting to display the poll results on this slide.

1727335

32 of 72

Example: Dice Is the Plural; Die Is the Singular

Let X be the outcome of a single die roll.

X is a random variable.

32

2. What is the variance, Var(X)?

1. What is the expectation, E[X]?

Approach 1: Definition

Approach 2: Property

= 91/6

1727335

1727335

33 of 72

Sums of Random Variables

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

Expectation and Variance

Sums of Random Variables

  • Example: Bernoulli and Binomial Random Variables

Populations and Samples

33

1727335

34 of 72

Functions of Multiple Random Variables

A function of a random variable is also a random variable!

If you create multiple random variables based on your sample…

…then functions of those random variables are also random variables.

For instance, if are random variables, then so are all of these:

34

Many functions of RVs that we care about (counts, means) involve sums of RVs, so we expand on properties of sums of RVs.

1727335

1727335

35 of 72

Equal vs. Identically Distributed vs. i.i.d.

Suppose that we have two random variables X and Y.

35

i.i.d. RVs

X and Y are equal if:

  • X(s) = Y(s) for every sample s.
  • We write X = Y.

X and Y are identically distributed if:

  • The distribution of X is the same as the distribution of Y
  • We say “X and Y are equal in distribution.”
  • If X = Y, then X and Y are identically distributed;�but the converse is not true (ex: Y = 7-X, X is a die)

X and Y are independent and identically distributed (i.i.d.) if:

  • X and Y are identically distributed, and
  • Knowing the outcome of X does not influence your belief�of the outcome of Y, and vice versa (“X and Y are independent.”)
  • Independence is covered more in Data 140/CS 70.
  • In Data 100, you will never be expected to prove that RVs are i.i.d.

1727335

1727335

36 of 72

Distributions of Sums

Let X1 and X2 be numbers on rolls of two dice.

  • X1 , X2 are i.i.d., so X1 , X2 have the same distribution.
  • But the sums Y = X1 + X1 = 2X1 and Z = X1 + X2 have�different distributions!

36

  • Same expectation…
  • But Y = 2X1 has larger variance!

How can we directly compute E[Y], Var(Y), without simulating distributions?

Let’s show this through simulation:

Demo

1727335

1727335

37 of 72

Properties of Expectation [1/3]

Instead of simulating full distributions, we often just compute expectation and variance directly.

Recall definitions of expectation:

Properties:

1. Expectation is linear.�Intuition: summations are linear. Proof

37

1727335

1727335

38 of 72

Properties of Expectation [2/3]

Instead of simulating full distributions, we often just compute expectation and variance directly.

Recall definitions of expectation:

Properties:

1. Expectation is linear.�Intuition: summations are linear. Proof

2. Expectation is linear in sums of RVs,�for any relationship between X and Y. Proof

38

1727335

1727335

39 of 72

Properties of Expectation [3/3]

Instead of simulating full distributions, we often just compute expectation and variance directly.

Recall definitions of expectation:

Properties:

1. Expectation is linear.�Intuition: summations are linear. Proof

2. Expectation is linear in sums of RVs,�for any relationship between X and Y. Proof

3. If g is a non-linear function, then in general .

  • Example: if X is -1 or 1 with equal probability, then E[X] = 0 but E[X2] = 1 ≠ 0.

39

1727335

1727335

40 of 72

Properties of Variance [1/2]

Recall definition of variance:

40

Distribution of X

SD

Distribution of -3X

-3X

3SD

-3X + 2

Distribution of -3X + 2

3SD

Properties:

1. Variance is non-linear:

Shifting doesn’t change variance, scaling does

Intuition (full proof): Consider the�Standard Deviation for Y = -3X+2:

1727335

1727335

41 of 72

Properties of Variance [2/2]

Recall definition of variance:

Properties:

1. Variance is non-linear:

Intuition (full proof): Consider the�Standard Deviation for Y = -3X+2:

2. Variance of sums of RVs is affected by the (in)dependence of the RVs (derivation):

41

Covariance of X and Y (next slide).

If X, Y independent,

then Cov(X, Y) = 0.

Typo is fixed compared to the live lecture

1727335

1727335

42 of 72

Covariance and Correlation: The Basics

Covariance is the expected product of deviations from expectation.

  • A generalization of variance. Note .

42

X in standard units (link)

  • Interpret by defining correlation (yes, that correlation!):

Correlation (and therefore covariance) measures a linear relationship between X and Y.

1727335

1727335

43 of 72

Covariance and Correlation: The Basics

Covariance is the expected product of deviations from expectation.

  • A generalization of variance. Note .
  • Interpret by defining correlation (yes, that correlation!):

Correlation (and therefore covariance) measures a linear relationship between X and Y.

  • If X and Y are correlated, then knowing X tells you something about Y.
  • “X and Y are uncorrelated” is the same as “Correlation and covariance equal to 0”

43

standard units of X (link)

  • Independent X, Y are uncorrelated, because knowing X tells you nothing about Y.
  • The converse is not necessarily true: X, Y could be uncorrelated but not independent.
  • For more info, see extra slides + take Data 140/CS 70.

1727335

1727335

44 of 72

Dice, Our Old Friends: Expectation

Let X1 and X2 be numbers on one roll of two dice.

  • X1 , X2 are i.i.d., so X1 , X2 have the same distribution.
  • Therefore E[X1] = E[X2] = 7/2 Var(X1) = Var(X2) = 35/12

44

E[Y] = E[2X1] = 2E[X1] = 7

E[Z] = E[X1 + X2]= E[X1] + E[X2] = 7

Z = X1 + X2

Y = 2X1

1727335

1727335

45 of 72

Dice, Our Old Friends: Variance

Let X1 and X2 be numbers on two rolls of a die.

  • X1 , X2 are i.i.d., so X1 , X2 have the same distribution.
  • Therefore E[X1] = E[X2] = 7/2 Var(X1) = Var(X2) = 35/12

45

Z = X1 + X2

E[Y] = E[2X1] = 2E[X1] = 7

Y = 2X1

E[Z] = E[X1 + X2 ]=E[X1] + E[X2] = 7

Var(Y) = Var(2X1) = 4Var(X1)

= 4(35/12)

11.67

Var(Z) = Var(X1) + Var(X2) + 2Cov(X1, X2)

= (35/12) + (35/12) + 0

5.83

X1, X2 independent

0

Simulation Results

1727335

1727335

46 of 72

[Summary] Expectation and Variance for Linear Functions of Random Variables

Let X be�a random variable with�distribution P(X = x).

46

(definition)

(easier computation)

Let a and b be�scalar values.

0 if X, Y independent.

Let Y be�another random variable.

1727335

1727335

47 of 72

Bernoulli and Binomial Random Variables

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

Expectation and Variance

Sums of Random Variables

  • Example: Bernoulli and Binomial Random Variables

Populations and Samples

47

1727335

48 of 72

Common Random Variables

Bernoulli(p)

  • Takes on value 1 with probability p, and 0 with probability 1 - p
  • AKA the “indicator” random variable.

Binomial(n, p)

  • Number of 1s in n independent Bernoulli(p) trials

Uniform on a finite set of values

  • Probability of each value is 1 / (size of set)
  • For example, a standard die

Uniform on the unit interval(0, 1)

  • Density is flat on (0, 1) and 0 elsewhere

Normal(𝜇, 𝜎2)

48

We’ll now revisit these to solidify our understanding of expectation/variance.

1727335

1727335

49 of 72

Properties of Bernoulli Random Variables

Let X be a Bernoulli(p) random variable.

  • Takes on value 1 with probability p, and 0 with probability 1 - p.
  • AKA the “indicator” random variable.

49

We will get an average value of p across many, many samples

Lower Var: p = 0.1 or 0.9

Higher Var: p close to 0.5

Definitions

Expectation:

Variance:

1727335

1727335

50 of 72

Properties of Binomial Random Variables

Let Y be a Binomial(n, p) random variable.

  • Y is the number (i.e., count) of 1s in n independent Bernoulli(p) trials.
  • Distribution of Y given by the binomial formula.

50

A count is a sum of 0’s and 1’s.

We can write:

  • Xi is the indicator of success on trial i. Xi = 1 if trial i is a success, else 0.
  • All Xi s are i.i.d. (independent and identically distributed) and Bernoulli(p).

1727335

1727335

51 of 72

Properties of Binomial Random Variables

Let Y be a Binomial(n, p) random variable.

  • Y is the number (i.e., count) of 1s in n independent Bernoulli(p) trials.
  • Distribution of Y given by the binomial formula (Lecture 2).

51

Expectation:

We can write:

  • Xi is the indicator of success on trial i. Xi = 1 if trial i is a success, else 0.
  • All Xi s are IID (independent and identically distributed) and Bernoulli(p).

A count is a sum of 0’s and 1’s.

Variance: Because all Xi s are independent, Cov(Xi , Xj ) = 0 for all i, j.

1727335

1727335

52 of 72

Binomial(n, p) for Large n

For p = 0.5, n = 50 (i.e. number of heads in 50 fair coin flips):

52

Y_50

Hallgrímskirkja in Iceland

Covered until here on 10/19

1727335

1727335

53 of 72

Populations and Samples

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

Expectation and Variance

Sums of Random Variables

  • Example: Bernoulli and Binomial Random Variables

Populations and Samples

53

1727335

54 of 72

From Populations to Samples

Today, we’ve talked extensively about populations:

  • If we know the distribution of a random variable, we can reliably compute expectation, variance, functions of the random variable, etc.

However, in Data Science, we often collect samples.

  • We don’t know the distribution of our population.
  • We’d like to use the distribution of your sample to estimate/infer properties of the population.

54

Population�(Sampling Frame)

Sample uniformly at random with replacement

Sample�size n

The big assumption we make in modeling/inference:

1727335

1727335

55 of 72

The Sample is a Set of i.i.d. Random Variables

55

Population�(Sampling Frame)

Sample�size n

Sample uniformly at random with replacement

Each observation in our sample is a Random Variable drawn i.i.d. from our population distribution.

X1 , X2 , , Xn

df.sample(n, replace=True)

[documentation]

Sample

(n << N)

… …

Population (really large N)

or

1727335

1727335

56 of 72

The Sample is a Set of i.i.d. Random Variables

56

Population�(Sampling Frame)

Sample�size n

… …

df.sample(n, replace=True)

[documentation]

Population Mean

A number,�i.e., fixed value

Sample Mean

A random variable!

Depends on our randomly drawn sample!!

np.mean(...) = 5.71

E[X]= 5.9

X1 , X2 , , Xn

Sample

or

Sample uniformly at random with replacement

1727335

1727335

57 of 72

[Terminology] Sample Mean

Consider an i.i.d. sample X1 , X2 , , Xn drawn from a population with mean 𝜇 and SD 𝜎.

Define the sample mean:

57

is normally distributed by the Central Limit Theorem.

Distribution?

Expectation:

Variance/Standard Deviation:

IID → Cov(Xi, Xj) = 0

1727335

1727335

58 of 72

Central Limit Theorem

No matter what population you are drawing from:

If an i.i.d. sample of size n is large,

the probability distribution of the sample mean

is roughly normal with mean 𝜇 and SD .

58

(previous slide)

(Out of scope)

𝜇

Any theorem that provides the rough distribution of a statistic�and doesn’t need the distribution of the population is valuable to data scientists.

  • Because we rarely know a lot about the population!

For a more in-depth demo: https://onlinestatbook.com/stat_sim/sampling_dist/

1727335

1727335

59 of 72

How Large Is “Large”?

No matter what population you are drawing from:

If an i.i.d. sample of size n is large,

the probability distribution of the sample mean

is roughly normal with mean 𝜇 and SD .

59

How large does n have to be for the normal approximation to be good?

  • …It depends on the shape of the distribution of the population…
  • If population is roughly symmetric and unimodal/uniform, could need as few as n = 20.�If population is very skewed, you will need bigger n.
  • If in doubt, you can bootstrap the sample mean and see if the bootstrapped distribution is bell-shaped.

1727335

1727335

60 of 72

Using the Sample Mean to Estimate the Population Mean

Our goal is often to estimate some characteristic of a population.

  • Example: average height of Cal undergraduates.
  • We typically can collect a single sample. It has just one average.
  • Since that sample was random, it could have come out differently.
  • CLT helps us understand how this can come out differently.

60

n = 800

n = 200

We should consider the average value and spread of all possible sample means, and what this means for how big n should be.

1727335

1727335

61 of 72

Using the Sample Mean to Estimate the Population Mean

Our goal is often to estimate some characteristic of a population.

  • Example: average height of Cal undergraduates.
  • We typically can collect a single sample. It has just one average.
  • Since that sample was random, it could have come out differently.
  • CLT helps us understand how this can come out differently.

61

n = 800

n = 200

For every sample size, the expected value of the sample mean is the population mean.

We call the sample mean an�unbiased estimator of the population mean.�(more in next lecture)

We should consider the average value and spread of all possible sample means, and what this means for how big n should be.

1727335

1727335

62 of 72

Using the Sample Mean to Estimate the Population Mean

Our goal is often to estimate some characteristic of a population.

  • Example: average height of Cal undergraduates.
  • We typically can collect a single sample. It has just one average.
  • Since that sample was random, it could have come out differently.
  • CLT helps us understand how this can come out differently.

62

n = 800

n = 200

For every sample size, the expected value of the sample mean is the population mean.

We call the sample mean an�unbiased estimator of the population mean.�(more in next lecture)

Square root law (Data 8): If you increase the sample size by a factor, the SD decreases by the square root of the factor.

The sample mean is more likely to be close to the population mean if we have a larger sample size.

We should consider the average value and spread of all possible sample means, and what this means for how big n should be.

1727335

1727335

63 of 72

Have a Normal Day!

63

1727335

1727335

64 of 72

[Extra Slides] Derivations

Lecture 17, Data 100 Fall 2023

Random Variables and Distributions

  • Expectation and Variance
  • Equality vs Identically Distributed
  • Common RVs: Bernoulli, Binomial

Functions of Random Variables

  • Distributions through Simulation, I.I.D.
  • Properties of Expectation and Variance
  • Covariance, Correlation
  • Standard Units

Sample Statistics

  • Sample Mean
  • Central Limit Theorem

64

1727335

65 of 72

Standardization of Random Variables

X in standard units is the random variable .

Xsu measures X on the scale “number of SDs from expectation.”

  • It is a linear transformation of X. By the linear transformation rules for expectation and variance:

  • Since Xsu is centered (has expectation 0):

You should prove these facts yourself.

65

Jump back: link

1727335

1727335

66 of 72

Variance: An Alternate Derivation

There’s a more convenient form of variance for use in calculations.

To derive this, we make repeated use of the linearity of expectation.

66

Jump back: link

1727335

1727335

67 of 72

Properties of Expectation #1

Recall definition of expectation:

1. Expectation is linear:

(intuition: summations are linear)

Proof:

67

Jump back: link

1727335

1727335

68 of 72

Properties of Expectation #2

Recall definitions of expectation:

3. Expectation is linear in sums of RVs:

For any relationship between X and Y.

Proof:

68

Jump back: link

1727335

1727335

69 of 72

Properties of Variance #1

We know that .

In order to compute Var(aX + b), consider:

  • A shift by b units does not affect spread. Thus, Var(aX + b) = Var(aX).
  • The multiplication by a does affect spread!

Then,

In summary:

69

Jump back: link

Don’t forget the absolute values and squares!

1727335

1727335

70 of 72

Properties of Variance #2

The variance of a sum is affected by the dependence between the two random variables that are being added. Let’s expand out the definition of Var(X + Y) to see what’s going on. Let .

We see that the variance of a sum is equal to the sum of variances, PLUS this weird term...

70

By the linearity of expectation, and the substitution.

Jump back: link

1727335

1727335

71 of 72

Addition Rule for Variance

If X and Y are uncorrelated (in particular, if they are independent), then

Therefore, under the same conditions,

  • Think of this as “Pythagorean theorem” for random variables.
  • Uncorrelated random variables are like orthogonal vectors.

71

1727335

1727335

72 of 72

Random Variables

Content credit: Acknowledgments

72

LECTURE 17

1727335