Lecture 33
Residuals
DATA 8
Fall 2020
Regression roadmap
Errors and Residuals
Error in Estimation
Residuals
= observed y - regression estimate of y
= observed y - height of regression line at x
= vertical distance between the point and the best line
(Demo)
Regression Diagnostics
Example: Dugongs
(Demo)
Residual Plot
A scatter diagram of residuals
(Demo)
Properties of residuals
(Demo)
Discussion Questions
How would we adjust our regression line…
A Measure of Clustering
Correlation, Revisited
(Demo)
SD of Fitted Values
---------------------------- = |r|
SD of y
Variance of Fitted Values
= Mean Square of the Deviations
--------------------------------- = r²
Variance of y
A Variance Decomposition
By definition,
y = fitted values + residuals
Tempting (but wrong) to think that:
SD(y) = SD(fitted values) + SD(residuals)
But it is true that:
Var(y) = Var(fitted values) + Var(residuals)
(a result of the Pythagorean theorem!)
A Variance Decomposition
Var(y) = Var(fitted values) + Var(residuals)
--------------------------------- = r²
Variance of y
--------------------------------- = 1 - r²
Variance of y
Residual Average and SD
--------------------------------- = 1 - r²
Variance of y
(Demo)
Discussion Question 1
Midterm: Average 70, SD 10
Final: Average 60, SD 15
r = 0.6
Fill in the blank:
The SD of the residuals is _______.
Discussion Question 2
Midterm: Average 70, SD 10
Final: Average 60, SD 15
r = 0.6
Fill in the blank:
For at least 75% of the students, the regression estimate of final score based on midterm score will be correct to within ___________ points.