Multiple
Regression
1
PADP 8130 - Week 2
Slides directly adapted from Introductory Econometrics: A Modern Approach, 6th Edition (Wooldridge)
Multiple Regression Analysis: Estimation
Definition of the multiple linear regression model:
2
Dependent variable,
explained variable,
response variable,…
Independent variables,
explanatory variables,
regressors,…
Error term,
disturbance,
unobservables,…
Intercept
Slope parameters
“Explains variable in terms of variables ”
Multiple Regression Analysis: Estimation
3
Hourly wage
Years of education
Years of labor market experience
All other factors…
Now measures effect of education explicitly holding experience fixed
Multiple Regression Analysis: Example 2
4
Average standardized
test score of school
Other factors
Per student spending
at this school
Average family income
of students at this school
Multiple Regression Analysis: Example 3
5
Family consumption
Other factors
Family income
Family income squared
By how much does consumption
increase if income is increased
by one unit?
Depends on how much income is already there
Multiple Regression Analysis: Example 4
6
Log of CEO salary
Log sales
Quadratic function of CEO tenure with the firm
Multiple Regression Analysis: Estimation
OLS Estimation of the multiple regression model
7
Minimization will be carried out by computer
Multiple Regression Analysis: Estimation
8
By how much does the dependent variable change if the j-th independent variable is increased by one unit, holding all other independent variables and the error term constant
Multiple Regression Analysis: Estimation
9
Grade point average at college
High school grade point average
Achievement test score
Multiple Regression Analysis: Estimation
10
Fitted or predicted values
Residuals
Deviations from regression line sum up to zero
Covariance between deviations and regressors are zero
Sample averages of y and of the regressors lie on regression line
Multiple Regression Analysis: Estimation
11
Multiple Regression Analysis: Estimation
12
Notice that R-squared can only increase if another explanatory variable is added to the regression
R-squared is equal to the squared correlation coefficient between the
actual and the predicted value of the dependent variable
Multiple Regression Analysis: Estimation
13
Number of times
arrested 1986
Proportion prior arrests
that led to conviction
Months in prison 1986
Quarters employed 1986
Multiple Regression Analysis: Estimation
14
Number of times
arrested 1986
Proportion prior arrests
that led to conviction
Months in prison 1986
Quarters employed 1986
Multiple Regression Analysis: Estimation
15
Average sentence in prior convictions
R-squared increases only slightly
Multiple Regression Analysis: Estimation
16
In the population, the relationship between y and the explanatory variables is linear
The data is a random sample drawn from the population
Each data point therefore follows the population equation
Multiple Regression Analysis: Estimation
17
“In the sample (and therefore in the population), none
of the independent variables is constant and there are
no exact linear relationships among the independent variables.”
Multiple Regression Analysis: Estimation
18
In a small sample, avginc may accidentally be an exact multiple of expend; it will not be possible to disentangle their separate effects because there is exact covariation
Either shareA or shareB will have to be dropped from the regression because there is an exact linear relationship between them: shareA + shareB = 1
Multiple Regression Analysis: Estimation
19
The value of the explanatory variables must contain no information about the mean of the unobserved factors
If avginc was not included in the regression, it would end up in the error term; it would then be hard to defend that expend is uncorrelated with the error
Multiple Regression Analysis: Estimation
20
Multiple Regression�Analysis: Estimation
21
= 0 in the population
No problem because .
However, including irrelevant variables may increase sampling variance.
True model (contains x1 and x2)
Estimated model (x2 is omitted)
Multiple Regression�Analysis: Estimation
22
If x1 and x2 are correlated, assume a linear regression relationship between them
If y is only regressed
on x1 this will be the estimated intercept
If y is only regressed
on x1, this will be the estimated slope on x1
error term
Multiple Regression�Analysis: Estimation
23
Will both be positive
The return to education will be overestimated because . It will look
as if people with many years of education earn very high wages, but this is partly
due to the fact that people with more education are also more able on average.
Multiple Regression�Analysis: Estimation
24
True model (contains x1, x2, and x3)
Estimated model (x3 is omitted)
If exper is approximately uncorrelated with educ and abil, then the direction
of the omitted variable bias can be as analyzed in the simple two variable case.
Multiple Regression�Analysis: Estimation
25
The value of the explanatory variables
must contain no information about the variance of the unobserved factors
This assumption may also be hard
to justify in many cases
with
All explanatory variables are collected in a random vector
Multiple Regression�Analysis: Estimation
26
Under assumptions MLR.1 – MLR.5:
Variance of the error term
Total sample variation in
explanatory variable xj:
R-squared from a regression of explanatory variable xj on all other independent variables
(including a constant)
Multiple Regression�Analysis: Estimation
27
Multiple Regression�Analysis: Estimation
28
Regress on all other independent variables (including a constant)
The R-squared of this regression will be the higher
the better xj can be linearly explained by the other independent variables
Multiple Regression�Analysis: Estimation
29
Average standardized
test score of school
Expenditures
for teachers
Expenditures for in-
structional materials
Other ex-
penditures
The different expenditure categories will be strongly correlated because if a school has a lot of resources it will spend a lot on everything.
It will be hard to estimate the differential effects of different expenditure categories because all expenditures are either high or low. For precise estimates of the differential effects, one would need information about situations where expenditure categories change differentially.
As a consequence, sampling variance of the estimated effects will be large.
Multiple Regression�Analysis: Estimation
30
Multiple Regression�Analysis: Estimation
31
As an (arbitrary) rule of thumb, the variance
inflation factor should not be larger than 10
Multiple Regression�Analysis: Estimation
32
True population model
Estimated model 1
Estimated model 2
Multiple Regression�Analysis: Estimation
33
Conditional on x1 and x2, the variance in model 2 is always smaller than that in model 1
Conclusion: Do not include irrelevant regressors
Trade off bias and variance; Caution: bias will not vanish even in large samples
Multiple Regression�Analysis: Estimation
34
An unbiased estimate of the error variance can be obtained by subtracting the number of estimated regression coefficients from the number of observations. The number of observations minus the number of estimated parameters is also called the degrees of freedom. The n estimated squared residuals in the sum are not completely independent but related through the k+1 equations that define the first order conditions of the minimization problem.
Multiple Regression�Analysis: Estimation
35
The true sampling
variation of the
estimated
The estimated samp-
ling variation of the
estimated
Plug in for the unknown
Multiple Regression�Analysis: Estimation
36
May be an arbitrary function of the sample values of all the explanatory variables; the OLS estimator
can be shown to be of this form
Multiple Regression�Analysis: Estimation
37
for all
for which .