Lecture 31
Linear Regression
DATA 8
Fall 2020
Regression roadmap
Correlation (Review)
The Correlation Coefficient r
r = 0
r = 0.2
r = 0.5
r = 0.8
r = 0.99
r = -0.5
Definition of r
average of |
product of |
x in standard units |
and |
y in standard units |
Correlation Coefficient (r) =
Measures how clustered the scatter is around a straight line
Discussion Question
For each pair, which one will have a higher* value of r?
a)
b)
c)
d)
* here, “higher” means “bigger on the number line”
Care in Interpretation
Watch Out For ...
(Demo)
Chocolate and Nobel Prizes
Discussion question
True or False?
Prediction
Galton's Heights
Galton's Heights
Galton's Heights
Nearest Neighbor Regression
A method for prediction:
For each x value, the prediction is the average of the y values in its nearby group.
The graph of these predictions is the “graph of averages”.
If the association between x and y is linear, then points in the graph of averages tend to fall on a line.
Where is the prediction line?
r = 0.99
Where is the prediction line?
r = 0.0
(Demo)
Linear Regression
Linear Regression
A statement about x and y pairs
On average, y deviates from 0 less than x deviates from 0
Not true for all points — a statement about averages
Regression Line
Correlation
Slope & Intercept
Regression Line Equation
In original units, the regression line has this equation:
Lines can be expressed by slope & intercept
estimated y in standard units
x in standard units
Regression Line
Standard Units
(0, 0)
1
r
Original Units
(Average x,� Average y)
SD x
r * SD y
Slope and Intercept
estimate of y = slope * x + intercept
(Demo)
Discussion Question
Suppose we use linear regression to predict candy prices (in dollars) from sugar content (in grams). What are the units of each of the following?
Discussion Question
A course has a midterm (average 70; standard deviation 10)�and a really hard final (average 50; standard deviation 12)
If the scatter diagram comparing midterm & final scores for students has an oval shape with correlation 0.75, then...
What do you expect the average final score would be for students who scored 90 on the midterm?
How about 60 on the midterm?
(Demo)