Chapter 12 Linear Regression and Correlation
OPENSTAX STATISTICS
1
Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
Objectives
By the end of this chapter, the student should be able to:
2
Introduction
3
Section 12.1
LINEAR EQUATIONS
4
Linear Equations
where a and b are constant numbers.
5
Examples of Linear Equations
6
Example
7
Example - Answer
8
Example
9
Example - Answers
10
Slope and Y-Intercept of �a Linear Equation
11
Example
12
Example - Answers
13
Section 8.2
SCATTER PLOTS
14
Scatter Plots
15
Scatter Plots, cont.
16
Some Scatter Plots
Remember, all the correlation coefficient tells us is whether or not the data are linearly related. In panel (d) the variables obviously have some type of very specific relationship to each other, but the correlation coefficient is zero, indicating no linear relationship exists.
Some More Scatter Plots
Some More Scatter Plots
Some More Scatter Plots
Section 8.3
THE REGRESSION EQUATION
21
Regression Line
22
Reminder about The Regression Line
23
Example
24
x (third exam score) | y (final exam score) |
65 | 175 |
67 | 133 |
71 | 185 |
71 | 163 |
66 | 126 |
75 | 198 |
67 | 153 |
70 | 163 |
71 | 159 |
69 | 151 |
69 | 159 |
Example - Answers
25
Least Squares Method
The ŷ is read "y hat" and is the estimated value of y. It is the value of y obtained using the regression line. It is not generally equal to y from data.
The Correlation Coefficient r
27
The Correlation Coefficient r, cont.
28
The Correlation Coefficient r, cont.
29
The Correlation Coefficient r, cont.
30
Interpreting the Intercept and Slope
The Coefficient of Determination
32
Example
Example - Answers
Example - Answers
The line of best fit is: ŷ = –173.51 + 4.83x
Interpretation of r2 in the context of this example:
Example
Example - Answers
Section 8.4
TESTING THE SIGNIFICANCE OF THE CORRELATION
COEFFICIENT
38
Testing the Significance of the �Correlation Coefficient
39
Performing the Hypothesis Test
Note:
40
Performing the Hypothesis Test, cont.
DRAWING A CONCLUSION:
41
Test Statistic
This is a t-statistic and operates in the same way as other t tests. Calculate the t-value and compare that with the critical value from the t-table at the appropriate degrees of freedom and the level of confidence you wish to maintain. If the calculated value is in the tail then cannot accept the null hypothesis that there is no linear relationship between these two independent random variables. If the calculated t-value is NOT in the tailed then cannot reject the null hypothesis that there is no linear relationship between the two variables.
Shorthand for Testing the Significance of r
then this implies that the correlation between the two variables demonstrates that a linear relationship exists and is statistically significant at approximately the 0.05 level of significance. As the formula indicates, there is an inverse relationship between the sample size and the required correlation for significance of a linear relationship.
Misuse of Correlation Coefficients
Example
45
Example with the Final Exam Example
Consider the third exam / final exam example from earlier. The line of best fit is: ŷ = –173.51+4.83x with r = 0.6631 and there are n = 11 data points. Can the regression line be used for prediction? Given a third-exam score (x value), can we use the line to predict the final exam score (predicted y value)?
46
A Few More Examples
Suppose you computed the following correlation coefficients.
47
Sections�12.5 and 12.6
PREDICTION AND OUTLIERS
48
Outliers
49
Identifying Outliers
50
How does the outlier affect �the best fit line?
51
Example
Example – Answers
Identifying Outliers �With Technology
54
Example
Example - Answers
Example
57
Example, cont.
ŷ = a + bx.
58
Example - Answers
59