JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 13

Lecture 24

Center and Spread

DATA 8

Spring 2017

Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)

2 of 13

Announcements

3 of 13

Confidence Interval Tests

4 of 13

95% Confidence Interval

Interval of estimates of a parameter
Based on random sampling
95% is called the confidence level

Could be any percent between 0 and 100
Bigger means wider intervals

The confidence is in the process that generated the interval:

It generates a “good” interval about 95% of the time.

(Demo)

5 of 13

Using a CI for Testing

Null hypothesis: Population mean = x
Alternative hypothesis: Population mean ≠ x
Cutoff for P-value: p%
Method:

Construct a (100-p)% confidence interval for the population average
If x is not in the interval, reject the null
If x is in the interval, can’t reject the null

6 of 13

Attendance

bit.ly/data8here

7 of 13

Average

8 of 13

The Average

Data: 2, 3, 3, 9 Average = (2+3+3+9)/4 = 4.25

Not a value in the collection
Need not be an integer even if the data are integers
Somewhere between min and max, but not necessarily halfway in between
Same units as the data
Smoothing operator: collect all the contributions in one big pot, then split evenly

9 of 13

Discussion Question

Which is bigger?

(a) mean

(b) median

10 of 13

Properties of the Mean

Balance point of the histogram
Not the “halfway point” of the data; the mean is not the median...
Unless the distribution is symmetric about a point, then that point is both the average and the median
If the histogram is skewed, then the mean is pulled away from the median in the direction of the tail

11 of 13

Standard Deviation

12 of 13

Defining Variability

Plan A: “biggest value - smallest value”

Doesn’t provide information about the shape of the distribution

Plan B:

Measure variability around the mean
Need to figure out a way to quantify this

(Demo)

13 of 13

How Far from the Average?

Standard deviation (SD) measures roughly how far the data are from their average

SD = root mean square of deviations from average

5 4 3 2 1

SD has the same units as the data; hence OK to say “average plus or minus a few SDs”