1 of 13

Lecture 29

Correlation

DATA 8

Fall 2023

2 of 13

Announcements

  • Homework 9 is due Wednesday at 11pm
  • Project 2 checkpoint due this Friday
    • Final deadline on November 10th
    • Get started early and come to OH/Project Party!
  • My OH are today: 5-7pm @ FSM

3 of 13

Weekly Goals

  • Today
    • A measure of linear association
  • Wednesday
    • Predicting one numerical variable based on another
    • The regression line
  • Friday
    • The “best” linear predictor
    • The method of least squares

4 of 13

Prediction

5 of 13

Guessing the Future

  • Based on incomplete information

  • One way of making predictions:
    • To predict an outcome for an individual,
    • find others who are like that individual
    • and whose outcomes you know.
    • Use those outcomes as the basis of your prediction.

(Demo)

6 of 13

Association

7 of 13

Two Numerical Variables

  • Trend
    • Positive association
    • Negative association
  • Pattern
    • Any discernible “shape” in the scatter
    • Linear
    • Non-linear

Visualize, then quantify

(Demo)

8 of 13

Correlation Coefficient

9 of 13

The Correlation Coefficient r

  • Measures linear association
  • Based on standard units
  • -1 ≤ r ≤ 1
    • r = 1: scatter is perfect straight line sloping up
    • r = -1: scatter is perfect straight line sloping down
  • r = 0: No linear association; uncorrelated

(Demo)

10 of 13

Definition of r

average of

product of

x in standard units

and

y in standard units

Correlation Coefficient (r) =

Measures how clustered the scatter is around a straight line

11 of 13

Care in Interpretation

12 of 13

Watch Out For ...

  • False conclusions of causation
  • Nonlinearity
  • Outliers
  • Ecological Correlations

(Demo)

13 of 13

Chocolate and Nobel Prizes