1 of 31

Lecture 28

Designing Experiments

DATA 8

Fall 2023

2 of 31

Announcements

  • Lab 8 is due tonight at 11pm
  • Homework 9 has been released and is due next Wednesday (11/2) at 11pm
  • Project 2 will be released this afternoon
    • Checkpoint next Friday (11/4)
    • Final deadline on November 11th
    • Get started early and utilize OH/Project Party!

3 of 31

Weekly Goals

  • Monday
    • The bell shaped curve and its relation to large random samples
  • Wednesday
    • Central limit theorem
    • The variability in a random sample average
  • Today
    • Constructing confidence intervals for sample means
    • Choosing the size of a random sample

4 of 31

Review: SD and Bell-Shaped Curves

If a histogram is bell-shaped, then

  • Where is the average?
  • What about SD?

5 of 31

Distribution of the �Average of a Large Sample

6 of 31

CLT with More Details

If the sample is large and drawn at random with replacement:

Then, regardless of the distribution of the population,

  • the probability distribution of the sample average
  • is roughly normal
  • What about mean and standard deviation?

7 of 31

CLT with More Details

If the sample is large and drawn at random with replacement:

Then, regardless of the distribution of the population,

  • the probability distribution of the sample average
  • is roughly normal
  • mean = population mean
  • SD = (population SD) / √sample size

8 of 31

Increasing Sample Size

9 of 31

Three Different SDs

Population of flight delays

  • Population mean:
  • Population SD: 27 minutes

Random sample of 100 flights

  • Sample mean: (estimate of )
  • Sample SD: estimate of population SD

SD of sample average: 27/sqrt(100) = 2.7

  • If we calculated from 10,000 samples, their SD would be ~2.7

10 of 31

Confidence Intervals

11 of 31

Graph of the Distribution

12 of 31

The Key to 95% Confidence

  • For about 95% of all samples, the sample average and population average are within 2 SDs of each other.

  • SD = SD of sample average

= (population SD) / √sample size

1 SD above the mean

2 SDs above the mean

13 of 31

Constructing the Interval

14 of 31

Constructing the Interval

For 95% of all samples,

  • If you stand at the population average and look two SDs on both sides, you will find the sample average.

  • Distance is symmetric.

  • So if you stand at the sample average and look two SDs on both sides, you will capture the population average.

15 of 31

The Interval

(Demo)

16 of 31

Summarizing: construction of intervals

  • 95% confidence interval for the sample mean
    • Sample_mean +/- 2*SD of the sample mean
  • SD of the sample mean
    • (population SD) / √sample size

  • But we dont know the population SD
    • We can estimate it using the sample SD
    • Or overestimate it

17 of 31

Question

If we can make 95% confidence interval in this way:

  • Sample_mean +/- 2*SD
  • Then why do we need to make confidence intervals using bootstraps

This method only works for means and sums ( as it is based on CLT) but bootstrap is a much more generalized approach which can work for other statistics like medians as well

18 of 31

Width of the Interval

Total width of a 95% confidence interval for the population average

= 4 * SD of the sample average

= 4 * (population SD) / √sample size

19 of 31

Sample Proportions

20 of 31

Proportions are Averages

  • Data: 0 1 0 0 1 0 1 1 0 0 (10 entries)
  • Sum = 4 = number of 1’s
  • Average = 4/10 = 0.4 = proportion of 1’s

If the population consists of 1’s and 0’s (yes/no answers to a question), then:

  • the population average is the proportion of 1’s in the population
  • the sample average is the proportion of 1’s in the sample

21 of 31

Confidence Interval

22 of 31

Controlling the Width

  • Total width of an approximate 95% confidence interval for a population proportion

= 4 * (SD of 0/1 population) / √sample size

  • The narrower the interval, the more precise your estimate.
  • Suppose you want the total width of the interval to be no more than 1%. How should you choose the sample size?

23 of 31

The Sample Size for a Given Width

0.01 = 4 * (SD of 0/1 population) / √sample size

  • Left side: 1%, the max total width that you’ll accept
  • Right side: formula for the total width

(Demo)

√sample size = 4 * (SD of 0/1 population) / 0.01

24 of 31

“Worst Case” Population SD

  • √sample size = 4 * (SD of 0/1 population) / 0.01

  • SD of 0/1 population is at most 0.5

  • √sample size ≥ 4 * 0.5 / 0.01

  • sample size ≥ (4 * 0.5 / 0.01) ** 2 = 40000

  • The sample size should be 40,000 or more

25 of 31

Discussion Question

26 of 31

27 of 31

Discussion Question

  • 3% margin of error means width of 6%

width = 4 * (0.5) / √ 1004

width ≈ 0.063, so margin of error ≈ 3.15%

28 of 31

Discussion Question

  • A researcher is estimating a population proportion based on a random sample of size 10,000.

Fill in the blank with a decimal:

  • With chance at least 95%, the estimate will be correct to within ________________.

29 of 31

Discussion Question

  • With chance at least 95%, the estimate will be correct to within 0.01.

width = 4 * (0.5) / √ 10000

width = 0.02, so margin of error = 0.01

30 of 31

Discussion Question

  • I am going to use a 68% confidence interval to estimate a population proportion.

  • I want the total width of my interval to be no more than 2.5%.

  • How large must my random sample be?

31 of 31

Discussion Question

  • How large must my random sample be?

0.025 = 2 * (0.5) / √sample size

√sample size = 2 * (0.5) / 0.025

sample size = 40**2 = 1600