1 of 19

Coefficient of Determination, R2

2 of 19

Objective 1

  • Compute and Interpret the Coefficient of Determination

4-2

3 of 19

R2 is the coefficient of determination, literally, the correlation coefficient, r, squared

4 of 19

4-4

The coefficient of determination, R2, measures the proportion of total variation in the response variable that is explained by the least-squares regression line.

The coefficient of determination is a number between 0 and 1, inclusive. That is, 0 < R2 < 1.

If R2 = 0 the line has no explanatory value

If R2 = 1 means the line explains 100% of the variation in the response variable.

5 of 19

4-5

The data to the right are based on the study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the predictor variable, x, and time (in minutes) to drill five feet is the response variable, y.

6 of 19

4-6

Sample Statistics

Mean St_Dev

Depth 126.2 52.2

Time 6.99 0.781

Correlation: 0.773

Regression Analysis

The regression equation is still

y = 5.53 + 0.0116 * x

Or, since x is depth and y is Time:

Time = 5.53 + 0.0116 Depth

7 of 19

4-7

Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best “guess”?

Sample Statistics

Mean St_Dev

Depth 126.2 52.2

Time 6.99 0.781

Correlation: 0.773

8 of 19

4-8

Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best “guess”?

ANSWER:

We would just use the mean of all the available data, the mean time to drill an additional 5 feet: 6.99 minutes

9 of 19

4-9

Now suppose that we are asked to predict the time to drill an additional 5 feet with our regression equation if the current depth of the drill is 160 feet?

ANSWER:

Our “guess” increased from 6.99 minutes to 7.39 minutes based on the knowledge that drill depth is positively associated with drill time.

10 of 19

4-10

11 of 19

4-11

The difference between the observed value of the response variable and the mean value of the response variable is called the total deviation and is equal to:

The difference between the predicted value of the response variable and the mean value of the response variable is called the explained deviation and is equal to:

The difference between the observed value of the response variable and the predicted value of the response variable is called the unexplained deviation and is equal to:

12 of 19

4-12

Total Deviation

Unexplained Deviation

Explained Deviation

+

=

13 of 19

4-13

Total Deviation

Unexplained Deviation

Explained Deviation

+

=

We want this statistic not just for one point, but for all the points that we are doing the regression analysis for, therefore:

14 of 19

4-14

Total Variation = Unexplained Variation + Explained Variation

1 =

Unexplained Variation

Explained Variation

Unexplained Variation

Explained Variation

Total Variation

Total Variation

Total Variation

Total Variation

+

= 1 –

R2 =

15 of 19

4-15

To determine R2 for the linear regression model simply square the value of the linear correlation coefficient.

Squaring the linear correlation coefficient to obtain the coefficient of determination works only for the least-squares linear regression model

16 of 19

4-16

EXAMPLE Determining the Coefficient of Determination

Find and interpret the coefficient of determination for the drilling data.

Because the linear correlation coefficient, r, is 0.773, we have that

R2 = 0.7732 = 0.5975 = 59.75%.

So, 59.75% of the variability in drilling time is explained by the least-squares regression line.

17 of 19

4-17

Draw a scatter diagram for each of these data sets. For each data set, the variance of y is 17.49.

18 of 19

4-18

Data Set A Data Set B Data Set C

Data Set A: 99.99% of the variability in y is explained by the least-squares regression line

Data Set B: 94.7% of the variability in y is explained by the least-squares regression line

Data Set C: 9.4% of the variability in y is explained by the least-squares regression line

19 of 19

General regression equation formula

Slope of the regression

y-intercept of the regression

Coefficient of determination

Correlation coefficient