1 of 24

Path analysis

In a triangular system, Y₂ affects Y₁ but Y₁ does not affect Y₂

Y₁ = a₀ + a₁ Y₂ + a₂ X + u(1)
Y₂ = b₀ + b₁ X+ b₂ Z + v(2)

Some researchers estimate such a system using “path analysis” rather than instrumental variables estimation

2 of 24

SEM (sem)

In STATA, a path analysis system can be estimated using the following syntax
sem (Y1 <- Y2 X) (Y2 <- X Z)
The <- shows the assumed direction of causality
Here it is assumed that X and Z directly affect Y2, but Z does not directly affect Y1

3 of 24

Variable Definitions for Examples

lnaf = natural log of audit fees
lnta = natural log of total assets
lnsales = natural log of sales
ln_age = natural log of firm age
listed = an indicator variable for publicly listed firms

4 of 24

Example #1

lnaf = a₀ + a₁ lnta + a₂ lnsales + u
lnta = b₀ + b₁ ln_sales+ b₂ ln_age + v

Note the exclusion restriction on ln_age
If this system is estimated using IV, it would be “just-identified” because the lnaf equation has one endogenous regressor (lnta) and one exclusion restriction (ln_age)
The error terms are assumed uncorrelated

5 of 24

Code to Estimate Example�(Both OLS and SEM regressions)

use "D:\Phd\Fees1.dta", clear
gen age= year-incorporationyear
gen ln_age=ln(age)
gen listed=0
replace listed=1 if companytype==2 | companytype==3 | companytype==5
egen miss=rmiss(lnaf lnta lnsales ln_age listed)
regress lnaf lnta lnsales if miss==0
regress lnta lnsales ln_age if miss==0
sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age) if miss==0

8 of 24

Example #1

Note that the OLS coefficients are exactly the same as the SEM coefficients

9 of 24

Example #2

The OLS and SEM coefficients continue to be the same if the system is over-identified when assuming uncorrelated errors

lnaf = a₀ + a₁ lnta + a₂ lnsales + u
lnta = b₀ + b₁ ln_sales+ b₂ ln_age + b₃ listed + v

Note the two exclusion restrictions on ln_age and listed
This system is “over-identified” because the lnaf equation has one endogenous regressor and two exclusion restrictions

regress lnaf lnta lnsales if miss==0
sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age listed) if miss==0

11 of 24

Example #3

lnaf = a₀ + a₁ lnta + a₂ lnsales + u
lnta = b₀ + b₁ ln_sales+ v

This system is “under-identified” in IV estimation because the lnaf equation has one endogenous regressor (lnaf) but no exclusion restrictions.

However, it can still be estimated using path analysis (regress or sem) if one assumes uncorrelated errors (cov u v) = 0).

12 of 24

Example #3

lnaf = a₀ + a₁ lnta + a₂ lnsales + u
lnta = b₀ + b₁ ln_sales+ v

Assuming uncorrelated errors is equivalent to assuming away the endogeneity problem

Thus, SEM gives the same output as OLS

regress lnaf lnta lnsales if miss==0
sem (lnaf <- lnta lnsales) (lnta <- lnsales) if miss==0

14 of 24

Summary

In each of the previous examples, the researcher assumed uncorrelated errors.

When assuming uncorrelated errors, the estimated coefficients from path analysis are identical to OLS (see previous examples).

There are two other alternative scenarios that may happen.

15 of 24

Alternative Scenario #1

1) The researcher allows correlated errors in a just-identified system.

The estimated coefficients from path analysis are identical to the coefficients from IV estimation

ivregress 2sls lnaf lnsales (lnta = lnsales ln_age) if miss==0
ivregress 2sls lnaf lnsales (lnta = lnsales ln_age) if miss==0
ivregress 2sls lnaf lnsales (lnta = lnsales ln_age) if miss==0
sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age) if miss==0, cov(e.lnaf*e.lnta)

18 of 24

Alternative Scenario #2

2) The researcher allows correlated errors in an over-identified system.

The estimated coefficients from path analysis are different from IV but the IV estimates are also different depending on whether the researcher uses 2SLS, LIML or GMM

sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age listed) if miss==0, cov(e.lnaf*e.lnta)
ivregress 2sls lnaf lnsales (lnta = lnsales ln_age listed) if miss==0
ivregress liml lnaf lnsales (lnta = lnsales ln_age listed) if miss==0
ivregress gmm lnaf lnsales (lnta = lnsales ln_age listed) if miss==0
reg lnta lnsales ln_age listed if miss==0

21 of 24

Over-identified systems

The previous example shows that, in an over-identified system with correlated errors, the coefficients are different between 2SLS, LIML, FIML, or GMM
Why is that?

22 of 24

LIML vs FIML

LIML and FIML are both maximum likelihood estimators.
Under LIML each equation in the system is estimated individually, whereas under FIML the equations are estimated jointly.
Thus, in LIML, the over-identifying restrictions in the other equation are not considered when estimating the coefficients of the equation with the endogenous mediator.
In FIML, the over-identifying restrictions are taken into account when estimating the system.
Imposing an additional exclusion restriction allows FIML to use more information from which to generate the coefficient estimates.
Consequently, FIML estimates are different from, and asymptotically more efficient than, the coefficient estimates from LIML.
Despite this advantage, some researchers prefer LIML to FIML because the median LIML estimate is close to unbiased even when the chosen instruments (W and Z) are weak.

23 of 24

2SLS vs LIML

24 of 24

GMM

GMM is a separate class of estimator based on moment functions.
In just-identified systems, GMM has moment conditions that exactly align to those of 2SLS.
Therefore, in just-identified systems, the GMM coefficients are identical to 2SLS (as well as LIML and FIML).
In over-identified systems, GMM relies on a weighting matrix to generate coefficient estimates, with the matrix also being estimated. Consequently, the GMM coefficients are different from other estimation methods (2SLS, LIML, FIML) when the system of equations is over-identified.

1 of 24

2 of 24

3 of 24

4 of 24

5 of 24

6 of 24

7 of 24

8 of 24

9 of 24

10 of 24

11 of 24

12 of 24

13 of 24

14 of 24

15 of 24

16 of 24

17 of 24

18 of 24

19 of 24

20 of 24

21 of 24

22 of 24

23 of 24

24 of 24