1 of 13

Lecture 24

Center and Spread

DATA 8

Spring 2017

Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)

2 of 13

Announcements

3 of 13

Confidence Interval Tests

4 of 13

95% Confidence Interval

  • Interval of estimates of a parameter
  • Based on random sampling
  • 95% is called the confidence level
    • Could be any percent between 0 and 100
    • Bigger means wider intervals
  • The confidence is in the process that generated the interval:
    • It generates a “good” interval about 95% of the time.

(Demo)

5 of 13

Using a CI for Testing

  • Null hypothesis: Population mean = x
  • Alternative hypothesis: Population mean ≠ x
  • Cutoff for P-value: p%
  • Method:
    • Construct a (100-p)% confidence interval for the population average
    • If x is not in the interval, reject the null
    • If x is in the interval, can’t reject the null

6 of 13

Attendance

7 of 13

Average

8 of 13

The Average

Data: 2, 3, 3, 9 Average = (2+3+3+9)/4 = 4.25

  • Not a value in the collection
  • Need not be an integer even if the data are integers
  • Somewhere between min and max, but not necessarily halfway in between
  • Same units as the data
  • Smoothing operator: collect all the contributions in one big pot, then split evenly

9 of 13

Discussion Question

Which is bigger?

(a) mean

(b) median

10 of 13

Properties of the Mean

  • Balance point of the histogram
  • Not the “halfway point” of the data; the mean is not the median...
  • Unless the distribution is symmetric about a point, then that point is both the average and the median
  • If the histogram is skewed, then the mean is pulled away from the median in the direction of the tail

11 of 13

Standard Deviation

12 of 13

Defining Variability

Plan A: “biggest value - smallest value”

  • Doesn’t provide information about the shape of the distribution

Plan B:

  • Measure variability around the mean
  • Need to figure out a way to quantify this

(Demo)

13 of 13

How Far from the Average?

  • Standard deviation (SD) measures roughly how far the data are from their average

  • SD = root mean square of deviations from average

5 4 3 2 1

  • SD has the same units as the data; hence OK to say “average plus or minus a few SDs”