1 of 52

Applied Statistics for Business (MAS202)

2 of 52

CHAPTER XIII: Simple linear regression

  1. Simple linear regression model

Example: Which data pairs are dependent/ independent ?

  1. Air temperature- Altitude
  2. Age of people- Age of dogs
  3. Height of students - MAS202 final scores
  4. Look at the scatter plot

(biแปƒu ฤ‘แป“ phรขn tรกn)

3 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

Relationship between two variables

โ€”> Linear relation is the simplest,

efficient model in explaining

the dependence of Y on X

Tฦฐฦกng quan

tuyแบฟn tรญnh

Tฦฐฦกng quan

phi tuyแบฟn tรญnh

Khรดng xรกc ฤ‘แป‹nh

4 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

AIM:

We use simple linear regression analysis to:

  • Establish a linear relationship between 2 random variables (data pair)
  • Predict the values of a dependent variable based on a given independent variable

5 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

SAMPLE LINEAR REGRESSION LINE (PREDICTION LINE)

X: independent variable Y: dependent variable

The linear regression line

where

  • bโ‚: the regression slope
  • bโ‚€: the regression intercept
  • : the estimated (predicted) Y value for observation Xi

6 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

How to compute bโ‚€, bโ‚?

Assume: (Xโ‚, Yโ‚), (Xโ‚‚, Yโ‚‚),...., (Xn, Yn) be a sample pairs of data , sample means

the sample linear regression line is

where:

7 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

Example 1: Does studying hard implies high grades?

The following table show the grades of 5 students and their number of study hours per week

  1. What is dependent variable, independent

variable?

  • Compute the estimated linear regression line

for the sample

  • predict the grade of a student spending

20 hours/week studying

Grades( /10)

study hours

4

10

5

8

8

12

9

10

7

15

8 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

Exercise:

9 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

NOTE:

  • The intercept bโ‚€ represents the estimated value of Y when X=0
  • The slope bโ‚ represents the expected change in Y per unit change in X.
  • The predicted values has the same mean as Y:

Example: Re-consider Example 1:

10 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

EXCEL package:

bโ‚€

bโ‚

11 of 52

CHAPTER XIII: Simple linear regression

  • Simple linear regression model

QUESTION: Are the predicted values correct ? How to estimate the errors ?

โ€”--> Consider the MEASURES OF VARIATION !

12 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation SUMS OF SQUARES

Assume: (Xโ‚, Yโ‚), (Xโ‚‚, Yโ‚‚),...., (Xn, Yn) be a sample pairs of data , sample means

Error sum of squares (unexplained variation)

Regression sum of squares (explained variation):

Total sum of squares (total variation):

Measures the variation of the Yi values around their mean Y.

Variation attributed to the relationship between X and Y.

Variation in Y attributed to factors other than X.

13 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation SUMS OF SQUARES

14 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation SUMS OF SQUARES

EXCEL package: SST= 17,2; SSR= 2,28; SSE= 14,9

Question: Find the percentage of total variation that can

be explained by the variable X? Answer: SSR/SST *100%= 13,31%

15 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation COEFFICIENT OF DETERMINATION & CORRELATION

  • The regression line gives good predictions when SSE is small !
  • To test whether the regression model is weak or strong, use:

NOTE:

- 0 โ‰ค rยฒ โ‰ค 1

- rยฒ measures the proportion of the total variation of Y which can be

explained by the factor X

Coefficient of determination:

16 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation COEFFICIENT OF DETERMINATION & CORRELATION

  • If rยฒ is close to 1 โ€”---> strong linear relationships between X and Y
  • If rยฒ is close to 0 โ€”---> weak linear relationships between X and Y

17 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation COEFFICIENT OF DETERMINATION & CORRELATION

Example: Re-consider Example 1, find the coefficient of determination. Explain the result.

18 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation COEFFICIENT OF DETERMINATION & CORRELATION

EXCEL package:

19 of 52

CHAPTER XIII: Simple linear regression

2. Measures of Variation COEFFICIENT OF DETERMINATION & CORRELATION

EXCEL package:

20 of 52

CHAPTER XIII: Simple linear regression

3. The residual analysis

Assume that (Xโ‚, Yโ‚), (Xโ‚‚, Yโ‚‚),...., (Xn, Yn) is a sample of n- observations in pairs.

โ€”--> The residuals are eโ‚, eโ‚‚, โ€ฆ, en

NOTE:

21 of 52

CHAPTER XIII: Simple linear regression

3. The residual analysis

EXCEL package:

Grades( /10)

study hours

4

10

5

8

8

12

9

10

7

15

22 of 52

CHAPTER XIII: Simple linear regression

3. The residual analysis

4 ASSUMPTIONS ON THE RESIDUALS: L.I.N.E

  1. Linearity: Only consider the linear relationship between variables. Relationships that are not linear are not discussed here!
  2. Independence of Errors: The random residuals e, e, โ€ฆ, e, must be independent of each other.
  3. Normality of Error: Each random residual e_i must be normally distributed, for all i
  4. Equal Variance: The random residuals e must have equal constant variance

23 of 52

CHAPTER XIII: Simple linear regression

3. The residual analysis

NOTE: EXCEL can help to create a residual plot/normal probability plot for the residuals of a data set

  • When the residual plot has

horizontal straight line shape

โ€”-> reasonable to use the linear model!

  • When using a normal probability plot,

normal errors will approximately display in

a straight line.

24 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

The linear regression line for the population is unknown:

The random error is

25 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

Note: The random error ๐œ€ has the estimated (sample) standard deviation

called standard error of the estimate

26 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

EXCEL package:

27 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope (hแป‡ sแป‘ gรณc) & correlation (hแป‡ sแป‘ tฦฐฦกng quan)

2 different ways to test for the slope ๐›ฝโ‚

T- test with df= n-2

F- test (only for zero slope)

28 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

First way: t- test, df= n-2

The estimated standard

deviation of ๐›ฝโ‚ is:

where

SYX: standard error of the estimate

29 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

First way: t- test, df= n-2

30 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

Example: Test whether ๐›ฝโ‚= 0.2 at 5% significance level

31 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

Second way: F- test for zero slope

Reject Hโ‚€ when FSTAT > F๐›ผ

If Reject Hโ‚€โ€”-> significant linear relationship

If not reject Hโ‚€โ€”> the linear relationship

is not significant

32 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

EXCEL package

33 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

Example: Use F-test to test for significant linear relationship. Use ๐›ผ=0.05

34 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

T- test for zero correlation Coefficient (or significant regression model)

๐œŒ: population correlation coefficient

r: estimated (sample) correlation for ๐œŒ

35 of 52

CHAPTER XIII: Simple linear regression

4 . Hypothesis Testing for the slope & correlation

T- test for zero correlation Coefficient (or significant regression model)

Example 1: You want to explore the relationship between the grades students receive on their first two exams. For a sample of 25 students, you find a correlation of 0.45.

What is your conclusion in testing ๐ปโ‚€:๐œŒ=0 versus ๐ปโ‚:๐œŒโ‰ 0 at significant level ฮฑ=0.05.

36 of 52

CHAPTER XIII: Simple linear regression

QUIZ

1.

37 of 52

CHAPTER XIII: Simple linear regression

2.

38 of 52

CHAPTER XIII: Simple linear regression

3.

39 of 52

CHAPTER XIII: Simple linear regression

4.

40 of 52

CHAPTER XIII: Simple linear regression

5.

41 of 52

CHAPTER XIII: Simple linear regression

6.

42 of 52

CHAPTER XIII: Simple linear regression

43 of 52

CHAPTER XIII: Simple linear regression

7.

44 of 52

CHAPTER XIII: Simple linear regression

8.

45 of 52

CHAPTER XIII: Simple linear regression

46 of 52

CHAPTER XIII: Simple linear regression

9.

47 of 52

CHAPTER XIII: Simple linear regression

10.

48 of 52

CHAPTER XIII: Simple linear regression

49 of 52

CHAPTER XIII: Simple linear regression

11.

50 of 52

CHAPTER XIII: Simple linear regression

12.

51 of 52

CHAPTER XIII: Simple linear regression

13.

52 of 52

CHAPTER XIII: Simple linear regression

14.