1 of 23

Lecture 26

The Normal Curve

DATA 8

Fall 2017

Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)

2 of 23

Announcements

3 of 23

Questions for This Week

  • How can we quantify natural concepts like “center” and “variability”?

  • Why do many of the empirical distributions that we generate come out bell shaped?

  • How is sample size related to the accuracy of an estimate?

4 of 23

Standard Deviation (Review)

5 of 23

How Far from the Average?

  • Standard deviation (SD) measures roughly how far the data are from their average

  • SD = root mean square of deviations from average

5 4 3 2 1

  • SD has the same units as the data

6 of 23

Why Use the SD?

  • The first reason:

No matter what the shape of the distribution,

the bulk of the data are in the range “average ± a few SDs”

There are two main reasons.

  • The second reason:

Coming up later in this lecture ...

7 of 23

How Big are Most of the Values?

No matter what the shape of the distribution,

the bulk of the data are in the range “average ± a few SDs”

Chebyshev’s Inequality

No matter what the shape of the distribution,

the proportion of values in the range “average ± z SDs” is

at least 1 - 1/z²

8 of 23

Chebyshev’s Bounds

Range

Proportion

average ± 2 SDs

at least 1 - 1/4 (75%)

average ± 3 SDs

at least 1 - 1/9 (88.888…%)

average ± 4 SDs

at least 1 - 1/16 (93.75%)

average ± 5 SDs

at least 1 - 1/25 (96%)

No matter what the distribution looks like

(Demo)

9 of 23

Standard Units

10 of 23

Standard Units

  • How many SDs above average?
  • z = (value - mean)/SD
    • Negative z: value below average
    • Positive z: value above average
    • z = 0: value equal to average
  • When values are in standard units: average = 0, SD = 1
  • Chebyshev: At least 96% of the values of z are between -5 and 5

(Demo)

11 of 23

Discussion Question

Find whole numbers that are close to:

  1. the average age

  • the SD of the ages

(Demo)

12 of 23

The SD and the Histogram

  • Usually, it's not easy to estimate the SD by looking at a histogram.

  • But if the histogram has a bell shape, then you can.

13 of 23

The SD and Bell-Shaped Curves

If a histogram is bell-shaped, then

  • the average is at the center

  • the SD is the distance between the average and the points of inflection on either side

14 of 23

Attendance

15 of 23

The Normal Distribution

16 of 23

The Standard Normal Curve

A beautiful formula that we won’t use at all:

17 of 23

Bell Curve

18 of 23

Normal Proportions

19 of 23

How Big are Most of the Values?

No matter what the shape of the distribution,

the bulk of the data are in the range “average ± a few SDs”

If a histogram is bell-shaped, then

  • Almost all of the data are in the range

“average ± 3 SDs”

20 of 23

Bounds and Normal Approximations

21 of 23

A “Central” Area

(Demo)

22 of 23

Central Limit Theorem

23 of 23

Second Reason for Using the SD

If the sample is

  • large, and
  • drawn at random with replacement,

Then, regardless of the distribution of the population,

the probability distribution of the sample sum

(or of the sample average) is roughly bell-shaped

(Demo)