Lecture 26
The Normal Curve
DATA 8
Fall 2017
Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)
Announcements
Questions for This Week
Standard Deviation (Review)
How Far from the Average?
5 4 3 2 1
Why Use the SD?
No matter what the shape of the distribution,
the bulk of the data are in the range “average ± a few SDs”
There are two main reasons.
Coming up later in this lecture ...
How Big are Most of the Values?
No matter what the shape of the distribution,
the bulk of the data are in the range “average ± a few SDs”
Chebyshev’s Inequality
No matter what the shape of the distribution,
the proportion of values in the range “average ± z SDs” is
at least 1 - 1/z²
Chebyshev’s Bounds
Range | Proportion |
average ± 2 SDs | at least 1 - 1/4 (75%) |
average ± 3 SDs | at least 1 - 1/9 (88.888…%) |
average ± 4 SDs | at least 1 - 1/16 (93.75%) |
average ± 5 SDs | at least 1 - 1/25 (96%) |
No matter what the distribution looks like
(Demo)
Standard Units
Standard Units
(Demo)
Discussion Question
Find whole numbers that are close to:
(Demo)
The SD and the Histogram
The SD and Bell-Shaped Curves
If a histogram is bell-shaped, then
Attendance
The Normal Distribution
The Standard Normal Curve
A beautiful formula that we won’t use at all:
Bell Curve
Normal Proportions
How Big are Most of the Values?
No matter what the shape of the distribution,
the bulk of the data are in the range “average ± a few SDs”
If a histogram is bell-shaped, then
“average ± 3 SDs”
Bounds and Normal Approximations
A “Central” Area
(Demo)
Central Limit Theorem
Second Reason for Using the SD
If the sample is
Then, regardless of the distribution of the population,
the probability distribution of the sample sum
(or of the sample average) is roughly bell-shaped
(Demo)