1 of 17

Lecture 25

The Normal Distribution

DATA 8

Spring 2017

Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)

2 of 17

Announcements

3 of 17

Standard Deviation (Review)

4 of 17

The Standard Deviation

  • Standard deviation (SD) measures roughly how far the data are from their average
  • SD = root mean square of deviations from average

5 4 3 2 1

  • SD has the same units as the data; hence OK to say “average plus or minus a few SDs”

(Demo)

5 of 17

Chebyshev's Inequality

6 of 17

How Big are Most of the Values?

Why Use Standard Deviation?

No matter what the shape of the distribution,

the bulk of the data are in the range “average ± a few SDs”

Chebyshev’s Inequality

No matter what the shape of the distribution,

the proportion of values in the range “average ± z SDs” is

at least 1 - 1/z²

7 of 17

Chebyshev’s Bounds

Range

Proportion

average ± 2 SDs

at least 1 - 1/4 (75%)

average ± 3 SDs

at least 1 - 1/9 (88.888…%)

average ± 4 SDs

at least 1 - 1/16 (93.75%)

average ± 5 SDs

at least 1 - 1/25 (96%)

No matter what the distribution looks like

(Demo)

8 of 17

Standard Units

9 of 17

Standard Units

  • How many SDs above average?
  • z = (value - mean)/SD
    • Negative z: value below average
    • Positive z: value above average
    • z=0: value equal to average
  • When values are in standard units: average = 0, SD = 1
  • By Chebyshev, most values of z are between -5 and 5

(Demo)

10 of 17

Attendance

11 of 17

The Normal Distribution

12 of 17

The SD and the Histogram

  • Usually, it's not easy to estimate the SD by looking at a histogram

  • But if the histogram has a bell shape, then you can

(Demo)

13 of 17

The SD and Bell-Shaped Curves

If a histogram is bell-shaped, then

  • the average is at the center

  • the SD is the distance between the average and the points of inflection on either side

(Demo)

14 of 17

Normal Proportions

15 of 17

How Big are Most of the Values?

No matter what the shape of the distribution,

the bulk of the data are in the range “average ± a few SDs”

If a histogram is bell-shaped, then

  • Almost all of the data are in the range

“average ± 3 SDs”

(Demo)

16 of 17

Bounds and Normal Approximations

(Demo)

17 of 17

Central Limit Theorem

If the sample is

  • large, and
  • drawn at random with replacement,

Then, regardless of the distribution of the population,

the probability distribution of the sample sum

(or of the sample average) is roughly bell-shaped

(Demo)