1 of 36

Linear Regression

1

Business Analytics

Lecture # 13

2 of 36

TOPICS to be COVERED

01

Linear Regression Equation

02

Slope Formula

03

Estimated Values

04

Standard Error

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

3 of 36

Types of Probabilistic Models

3

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

4 of 36

Regression Models

  • Relationship between one dependent variable and explanatory variable(s)
  • Use equation to set up relationship
      • Numerical Dependent (Response) Variable
      • 1 or More Numerical or Categorical Independent (Explanatory) Variables
  • Used Mainly for Prediction & Estimation

4

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

5 of 36

Regression

  • Regression: Prediction of one variable from knowledge of one or more other variables.
  • Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

6 of 36

Example

  • Managerial decisions are often based on the relationship between two or more variables. For example, after considering the relationship between advertising expenditures and sales, a marketing manager might attempt to predict sales for a given level of advertising expenditures.
  • In another case, a public utility might use the relationship between the daily high temperature and the demand for electricity to predict electricity usage on the basis of next month’s anticipated daily high temperatures.
  • Sometimes a manager will rely on intuition to judge how two variables are related. However, if data can be obtained, a statistical procedure called regression analysis can be used to develop an equation showing how the variables are related.

6

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

7 of 36

  • In regression terminology, the variable being predicted is called the dependent variable, or response.
  • The variables being used to predict the value of the dependent variable are called the independent variables, or predictor variables.

7

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

8 of 36

  • Simple linear regression, in which the relationship between one dependent variable (denoted by y) and one independent variable (denoted by x) is approximated by a straight line.
  • Multiple linear regression ,the relationship between a dependent variable (y) and two or more independent variables (x1, x2 , … , xq ).

8

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

9 of 36

What’s Slope?

A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

10 of 36

Model Specification is Based on Theory

  • 1. Theory of Field (e.g., Epidemiology)
  • 2. Mathematical Theory
  • 3. Previous Research
  • 4. ‘Common Sense’

10

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

11 of 36

Types of �Regression Models

Regression

Models

Linear

Non-Linear

2+ Explanatory

Variables

Simple

Multiple

Linear

1 Explanatory

Variable

Non-Linear

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

12 of 36

Regression Modeling Steps

  1. Hypothesize Deterministic Component
      • Estimate Unknown Parameters
  2. Specify Probability Distribution of Random Error Term
      • Estimate Standard Deviation of Error
  3. Evaluate the fitted Model
  4. Use Model for Prediction & Estimation

12

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

13 of 36

Best Fit Line, Minimising Sum of Squared Errors

  • y = m x + c
  • Here, ŷ = bx + a
    • ŷ : predicted value of y
    • b: slope of regression line
    • a: intercept

Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ).

Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) Σ(y- ŷ)2

ε

ε = residual

= y i , observed

= ŷ, predicted

ŷ = bx + a

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

14 of 36

  • Butler Trucking Company is an independent trucking company in Southern California.
  • A major portion of Butler’s business involves deliveries throughout its local area.
  • To develop better work schedules, the managers want to estimate the total daily travel times for their drivers.
  • The managers believe that the total daily travel times (denoted by y) are closely related to the number of miles traveled in making the daily deliveries (denoted by x).
  • Using regression analysis, we can develop an equation showing how the dependent variable y is related to the independent variable x.

14

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

15 of 36

  • In the Butler Trucking Company example, a simple linear regression model hypothesizes that the travel time of a driving assignment (y) is linearly related to miles travel (x) as follows:

15

b0 and b1 are population parameters that describe the y-intercept and

slope of the line relating y and x.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

16 of 36

  • the values of the population parameters β0 and β1 are not known and must be estimated using sample data. Sample statistics (denoted b0 and b1) are computed as estimates of the population parameters β0 and β1. Substituting the values of the sample statistics b0 and b1 for β0 and β1 in equation (7.1) and dropping the error ,we obtain the estimated regression for simple linear regression:

16

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

17 of 36

17

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

18 of 36

Linear Equations

18

© 1984-1994 T/Maker Co.

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

19 of 36

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

20 of 36

  • The graph of the estimated simple linear regression equation is called the estimated regression line; b0 is the estimated y-intercept, and b1 is the estimated slope.

20

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

21 of 36

21

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

22 of 36

Population & Sample Regression Models

Population

Random Sample

Unknown Relationship

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

23 of 36

Population & Sample Regression Models

Population

Random Sample

Unknown Relationship

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

24 of 36

Least Squares Method

  • The least squares method is a procedure for using sample data to find the estimated regression equation.
  • To illustrate the least squares method, suppose data were collected from a sample of 10 Butler Trucking Company driving assignments.

24

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

25 of 36

25

  • In addition, for these data, the relationship between the travel time and miles traveled appears to be approximated by a straight line; indeed, a positive linear relationship is indicated between x and y.
  • We therefore choose the simple linear regression model to represent this relationship.
  • Given that choice, our next task is to use the sample data in Table 7.1 to determine the values of b0 and b1 in the estimated simple linear regression equation.
  • For the ith driving assignment, the estimated regression equation provides

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

26 of 36

26

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

27 of 36

27

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

28 of 36

  • Figure 7.3 is a scatter chart of the data in Table 7.1. Miles traveled is shown on the
  • horizontal axis, and travel time (in hours) is shown on the vertical axis. Scatter charts for regression analysis are constructed with the independent variable x on the horizontal axis and the dependent variable y on the vertical axis.
  • The scatter chart enables us to observe the data graphically and to draw preliminary conclusions about the possible relationship between the variables.

28

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

29 of 36

29

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

30 of 36

30

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

31 of 36

  • Estimated slope of b1 = 0.0678 and a y-intercept of b0 =1.2739
  • estimated simple linear regression equation is yˆ =1.2739 + 0.0678(x1)

31

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

32 of 36

  • For the Butler Trucking Company model, we therefore estimate that, if the length of a driving assignment were 1 mile longer, the mean travel time for that driving assignment would be 0.0678 hour (or approximately 4 minutes) longer.
  • The y-intercept b0 is the estimated value of the dependent variable y when the independent variable x is equal to 0.
  • For the Butler Trucking Company model, we estimate that if the driving distance for a driving assignment was 0 units (0 miles), the mean travel time would be 1.2739 units (1.2739 hours, or approximately 76 minutes)

32

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

33 of 36

33

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

34 of 36

34

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

35 of 36

35

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

36 of 36

Thank You !

© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.