Lecture 26
Center and Spread
DATA 8
Spring 2020
Weekly Goals
Confidence Intervals For Testing
Using a CI for Testing
What if we want to do a hypothesis test, but we can’t simulate under the null?
Center and Spread
Questions
Average
The Average (or Mean)
Data: 2, 3, 3, 9 Average = (2+3+3+9)/4 = 4.25
(Demo)
Discussion Question
Are the medians of these two distributions the same or different? Are the means the same or different? If you say “different,” then say which one is bigger.
Comparing Mean and Median
Standard Deviation
Defining Variability
Plan A: “biggest value - smallest value”
Plan B:
(Demo)
How Far from the Average?
5 4 3 2 1
Why Use the SD?
No matter what the shape of the distribution,
the bulk of the data are in the range “average ± a few SDs”
There are two main reasons.
Coming up next time.
Chebyshev's Inequality
How Big are Most of the Values?
No matter what the shape of the distribution,
the bulk of the data are in the range “average ± a few SDs”
Chebyshev’s Inequality
No matter what the shape of the distribution,
the proportion of values in the range “average ± z SDs” is
at least 1 - 1/z²
Chebyshev’s Bounds
Range | Proportion |
average ± 2 SDs | at least 1 - 1/4 (75%) |
average ± 3 SDs | at least 1 - 1/9 (88.888…%) |
average ± 4 SDs | at least 1 - 1/16 (93.75%) |
average ± 5 SDs | at least 1 - 1/25 (96%) |
No matter what the distribution looks like
(Demo)
Standard Units
Standard Units
(Demo)
Discussion Question
Find whole numbers that are close to:
(Demo)
The SD and the Histogram
The SD and Bell-Shaped Curves
If a histogram is bell-shaped, then
(Demo)
Point of Inflection
The Normal Distribution
The Standard Normal Curve
A beautiful formula that we won’t use at all:
Bell Curve
Normal Proportions
How Big are Most of the Values?
No matter what the shape of the distribution,
the bulk of the data are in the range “average ± a few SDs”
If a histogram is bell-shaped, then
“average ± 3 SDs”
Bounds and Normal Approximations
A “Central” Area
Central Limit Theorem
Sample Averages
Central Limit Theorem
If the sample is
Then, regardless of the distribution of the population,
the probability distribution of the sample sum
(or the sample average) is roughly normal
(Demo)