Estimating a population parameter
Wayne Tai Lee
Review
Preview
FYI - Indicators and Sampling
Can we show:
E(Sample Average) = Population Average
Sample average is more robust than the histogram
Implications of CLT - Automation!!!
Implications of CLT - Automation!!!
Tuning the Normal distribution
One subtle issue with continuous random variables
If Y ~ Normal(𝜇, 𝜎2)
The probability of equally exactly one value is 0, i.e.
P(Y = k) = 0
It’s more natural to talk about an interval of values, e.g.
Motivating example
3 Distributions to ALWAYS think about
Top:
Middle:
Bottom:
CLT tells us how far sample average is from the box average
Key:
CLT tells us how far sample average is from the box average
Key:
What is
P(|Sample Avg - Expectation|>2)?
CLT tells us how far sample average is from the box average
Key:
What is
P(|Sample Avg - Expectation|>2)
if sample size is large enough
= P(|Z| > 2 * SE(Z)) where Z ~ Normal(0, SE(Sample Avg)2)
= 0.95
What if we don’t have enough samples? Chebychev!
Key:
What is
P(|Sample Avg - Expectation|>2)
if sample size is small
= P(|Z| > 2 * SE(Z)) where Z ~ ?, E(Z) = 0, and SE(Z) = SE(Sample Avg)
<= ¼
The boiler plate confidence interval calculation for population averages
Example
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
The procedure - check the assumptions + reasonableness
The procedure - obtain the estimate for the population parameter
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
estimates
The procedure - quantify chance-error
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
The procedure - quantify chance-error
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
The procedure - quantify chance-error
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
The procedure - determine a “confidence level”
Confidence level is the probability that this procedure will include the population parameter!
P(|Y| ≤ k ) | Confidence Level |
P(|Y| ≤ 1) | 0.68 |
P(|Y| ≤ 1.645) | 0.9 |
P(|Y| ≤ 2) | 0.95 |
P(|Y| ≤ 3) | 0.997 |
Y ~ Normal(0, 1)
Y ~ Normal(0, 1)
The procedure - determine a “confidence level”
Confidence level is the probability that this procedure will include the population parameter!
The common default is 95% so k = 2
P(|Y| ≤ k ) | Confidence Level |
P(|Y| ≤ 1) | 0.68 |
P(|Y| ≤ 1.645) | 0.9 |
P(|Y| ≤ 2) | 0.95 |
P(|Y| ≤ 3) | 0.997 |
Y ~ Normal(0, 1)
Y ~ Normal(0, 1)
The procedure - construct the final interval
The Xk% confidence interval is then:
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
The procedure - construct the final interval
The Xk% confidence interval is then:
An election poll consists of 400 randomly selected residents shows 33% support for candidate A out of 50000 residents. Please estimate the population % support for candidate A.
Interpretation
What is a 95% confidence interval?
Some notable choices
What happens if we have a small sample?
Simulations of confidence intervals
100 Simulations of varying n and the significance level
Our Box
Interpretation
What is a 95% confidence interval?
Proposed language
“We estimate the population parameter with a X% confidence interval, our sample suggests (a, b)”