1 of 31

Lecture 28

Designing Experiments

DATA 8

Fall 2023

2 of 31

Announcements

Lab 8 is due tonight at 11pm
Homework 9 has been released and is due next Wednesday (11/2) at 11pm
Project 2 will be released this afternoon

Checkpoint next Friday (11/4)
Final deadline on November 11th
Get started early and utilize OH/Project Party!

3 of 31

Weekly Goals

Monday

The bell shaped curve and its relation to large random samples

Wednesday

Central limit theorem
The variability in a random sample average

Today

Constructing confidence intervals for sample means
Choosing the size of a random sample

4 of 31

Review: SD and Bell-Shaped Curves

If a histogram is bell-shaped, then

Where is the average?
What about SD?

5 of 31

Distribution of the �Average of a Large Sample

6 of 31

CLT with More Details

If the sample is large and drawn at random with replacement:

Then, regardless of the distribution of the population,

the probability distribution of the sample average
is roughly normal
What about mean and standard deviation?

7 of 31

CLT with More Details

If the sample is large and drawn at random with replacement:

Then, regardless of the distribution of the population,

the probability distribution of the sample average
is roughly normal
mean = population mean
SD = (population SD) / √sample size

8 of 31

Increasing Sample Size

9 of 31

Three Different SDs

Population of flight delays

Population mean:
Population SD: 27 minutes

Random sample of 100 flights

Sample mean: (estimate of )
Sample SD: estimate of population SD

SD of sample average: 27/sqrt(100) = 2.7

If we calculated from 10,000 samples, their SD would be ~2.7

10 of 31

Confidence Intervals

11 of 31

Graph of the Distribution

12 of 31

The Key to 95% Confidence

For about 95% of all samples, the sample average and population average are within 2 SDs of each other.

SD = SD of sample average

= (population SD) / √sample size

1 SD above the mean

2 SDs above the mean

13 of 31

Constructing the Interval

14 of 31

Constructing the Interval

For 95% of all samples,

If you stand at the population average and look two SDs on both sides, you will find the sample average.

Distance is symmetric.

So if you stand at the sample average and look two SDs on both sides, you will capture the population average.

15 of 31

The Interval

(Demo)

16 of 31

Summarizing: construction of intervals

95% confidence interval for the sample mean

Sample_mean +/- 2*SD of the sample mean

SD of the sample mean

(population SD) / √sample size

But we dont know the population SD

We can estimate it using the sample SD
Or overestimate it

17 of 31

Question

If we can make 95% confidence interval in this way:

Sample_mean +/- 2*SD
Then why do we need to make confidence intervals using bootstraps

This method only works for means and sums ( as it is based on CLT) but bootstrap is a much more generalized approach which can work for other statistics like medians as well

18 of 31

Width of the Interval

Total width of a 95% confidence interval for the population average

= 4 * SD of the sample average

= 4 * (population SD) / √sample size

19 of 31

Sample Proportions

20 of 31

Proportions are Averages

Data: 0 1 0 0 1 0 1 1 0 0 (10 entries)
Sum = 4 = number of 1’s
Average = 4/10 = 0.4 = proportion of 1’s

If the population consists of 1’s and 0’s (yes/no answers to a question), then:

the population average is the proportion of 1’s in the population
the sample average is the proportion of 1’s in the sample

21 of 31

Confidence Interval

22 of 31

Controlling the Width

Total width of an approximate 95% confidence interval for a population proportion

= 4 * (SD of 0/1 population) / √sample size

The narrower the interval, the more precise your estimate.
Suppose you want the total width of the interval to be no more than 1%. How should you choose the sample size?

23 of 31

The Sample Size for a Given Width

0.01 = 4 * (SD of 0/1 population) / √sample size

Left side: 1%, the max total width that you’ll accept
Right side: formula for the total width

(Demo)

√sample size = 4 * (SD of 0/1 population) / 0.01

24 of 31

“Worst Case” Population SD

√sample size = 4 * (SD of 0/1 population) / 0.01

SD of 0/1 population is at most 0.5

√sample size ≥ 4 * 0.5 / 0.01

sample size ≥ (4 * 0.5 / 0.01) ** 2 = 40000

The sample size should be 40,000 or more

25 of 31

Discussion Question

https://www.scientificamerican.com/article/howcan-a-poll-of-only-100/

27 of 31

Discussion Question

3% margin of error means width of 6%

width = 4 * (0.5) / √ 1004

width ≈ 0.063, so margin of error ≈ 3.15%

28 of 31

Discussion Question

A researcher is estimating a population proportion based on a random sample of size 10,000.

Fill in the blank with a decimal:

With chance at least 95%, the estimate will be correct to within ________________.

29 of 31

Discussion Question

With chance at least 95%, the estimate will be correct to within 0.01.

width = 4 * (0.5) / √ 10000

width = 0.02, so margin of error = 0.01

30 of 31

Discussion Question

I am going to use a 68% confidence interval to estimate a population proportion.

I want the total width of my interval to be no more than 2.5%.

How large must my random sample be?

31 of 31

Discussion Question

How large must my random sample be?

0.025 = 2 * (0.5) / √sample size

√sample size = 2 * (0.5) / 0.025

sample size = 40**2 = 1600

1 of 31

2 of 31

3 of 31

4 of 31

5 of 31

6 of 31

7 of 31

8 of 31

9 of 31

10 of 31

11 of 31

12 of 31

13 of 31

14 of 31

15 of 31

16 of 31

17 of 31

18 of 31

19 of 31

20 of 31

21 of 31

22 of 31

23 of 31

24 of 31

25 of 31

26 of 31

27 of 31

28 of 31

29 of 31

30 of 31

31 of 31