Lecture 30
Linear Regression
DATA 8
Spring 2022
Announcements
Correlation Coefficient
The Correlation Coefficient r
r = 0
r = 0.2
r = 0.5
r = 0.8
r = 0.99
r = -0.5
Definition of r
average of |
product of |
x in standard units |
and |
y in standard units |
Correlation Coefficient (r) =
Measures how clustered the scatter is around a straight line
Care in Interpretation
Watch Out For ...
(Demo)
Discussion question
True or False?
If the correlation of x and y is close to 0, then knowing one cannot help us predict the other.
Chocolate and Nobel Prizes
https://www.biostat.jhsph.edu/courses/bio621/misc/Chocolate%20consumption%20cognitive%20function%20and%20nobel%20laurates%20(NEJM).pdf
Prediction
Predicting Heights
Average of parents’ heights
Child’s (adult) height
Approach to Prediction
Average of parents’ heights
Child’s (adult) height
Predicted Heights
Average of parents’ heights
Child’s (adult) height
Nearest Neighbor Regression
A method for prediction:
For each x value, the prediction is the average of the y values in its nearby group.
The graph of these predictions is the “graph of averages”.
If the association between x and y is linear, then points in the graph of averages tend to fall on a line.
Where is the prediction line?
r = 0.99
Where is the prediction line?
r = 0.0
(Demo)
Linear Regression
Linear Regression
A statement about x and y pairs
On average, y deviates from 0 less than x deviates from 0
Not true for all points — a statement about averages
Regression line
correlation