1 of 27

Lecture 23

The Normal Distribution

Summer 2021

2 of 27

Announcements

  • Tutor sections signups released! More info here
  • HW 08 released, due Friday, 7/23 at 11:59pm PT
  • Project 1 and Midterm Grades Released
    • Regrade Requests due Friday 7/23 for both
  • Zoom link for lecture will be changing starting tomorrow! Use this link instead: https://berkeley.zoom.us/j/92050640021
  • Midterm Walkthrough Session on Saturday at 12pm (noon) PT

3 of 27

Weekly Goals

  • Today
    • Center and Spread review
    • The bell shaped curve (Normal Distribution)
    • Central Limit Theorem
  • Friday
    • Sample Means
    • Designing Experiments

4 of 27

5 of 27

Variance &

Standard Deviation

6 of 27

How Far from the Mean (Average)?

  • Standard deviation (SD) measures how far the data deviate from their average
  • Variance = Mean Square of Deviations from Average of Data�
  • SD = Root Mean Square of Deviations from Average of Data

RMSDAD

  • SD has the same units as the data

7 of 27

Salient Feature of Standard Deviation

No matter what the shape of the distribution,�

the bulk of the data are in the range ��Mean ± A Few SDs

8 of 27

Standard Units

9 of 27

Standard Units

  • How many SDs above average?
  • z = (value - Mean)/SD
    • Negative z: value below mean
    • Positive z: value above mean
    • z = 0: value equal to Mean
  • When values are in standard units: Mean = 0, SD = 1
  • Gives us a way to compare/understand data no matter what the original units

(Demo)

10 of 27

Discussion Question

Find whole numbers that are close to:

  1. the average age

  • the SD of the ages

Average in standard units ≅ 0

≅ 27

Average + 1SD in standard units ≅ 1

27 + SD ≅ 33

SD ≅ 33-27 ≅ 6

(Demo)

11 of 27

Chebyshev's Inequality

12 of 27

How Big are Most of the Values?

Chebyshev’s Inequality

No matter what the shape of the distribution,

the proportion of values (i.e., fraction of the population) in the range “Mean ± z SDs” is

is at least 1 - 1/z²

13 of 27

Chebyshev’s Bounds

Range

Proportion

Mean ± 2 SDs

at least 1 - 1/4 (75%)

Mean ± 3 SDs

at least 1 - 1/9 = 8/9 (88.89%)

Mean ± 4 SDs

at least 1 - 1/16 = 15/16 (93.75%)

Mean ± 5 SDs

at least 1 - 1/25 = 24/25 (96%)

No matter what the distribution looks like!

14 of 27

The SD and the Histogram

  • Usually, it's not easy to estimate the SD by looking at a histogram.

  • But if the histogram has a bell shape, then you can.

15 of 27

The SD and Bell-Shaped Curves

If a histogram is bell-shaped, then

  • the average is at the center

  • SD = distance between average and point of inflection on either side

(Demo: Maternal Heights)

16 of 27

Point of Inflection

17 of 27

The Normal Distribution

(AKA, Bell-Shaped Curve)

(AKA, Gaussian Distribution)

18 of 27

The Standard Normal Curve

A beautiful formula that we won’t use at all:

19 of 27

Bell Curve

20 of 27

Normal Proportions

21 of 27

How Big are Most of the Values?

No matter what the shape of the distribution,

the bulk of the data are in the range

“Mean ± A Few SDs”

If a histogram is bell-shaped, then

  • Almost all of the data are in the range

“Mean ± 3 SDs”

22 of 27

Bounds and Normal Approximations

NOTE: If our random distribution is Normal, then we don’t need to bootstrap.

Our 95% confidence interval is simply: Mean ± 2 SDs.

Percent in Range

All Distributions (Chebyshev’s)

1-1/z2

Normal Distribution

Mean ± 1 SD

1-1/1 = 0 (0%)

68%

Mean ± 2 SDs

1-1/22 = 3/4 (75%)

95%

Mean ± 3 SDs

1-1/32 = 8/9 (88.89%)

99.73%

23 of 27

A “Central” Area

Source: Statistics How To

24 of 27

Central Limit Theorem

25 of 27

26 of 27

Sample Means (Averages)

  • The Central Limit Theorem describes how the normal distribution (a bell-shaped curve) is connected to random sample averages.
  • We care about sample averages because they estimate population averages.

27 of 27

Central Limit Theorem

If the sample is

  • large, and
  • drawn at random with replacement,

Then, regardless of the distribution of the population,

the probability distribution of the sample average

is roughly normal

(Demo)