1 of 24

Path analysis

  • In a triangular system, Y2 affects Y1 but Y1 does not affect Y2
    • Y1 = a0 + a1 Y2 + a2 X + u (1)
    • Y2 = b0 + b1 X + b2 Z + v (2)
  • Some researchers estimate such a system using “path analysis” rather than instrumental variables estimation

1

2 of 24

SEM (sem)

  • In STATA, a path analysis system can be estimated using the following syntax
  • sem (Y1 <- Y2 X) (Y2 <- X Z)
  • The <- shows the assumed direction of causality
  • Here it is assumed that X and Z directly affect Y2, but Z does not directly affect Y1

2

3 of 24

Variable Definitions for Examples

  • lnaf = natural log of audit fees
  • lnta = natural log of total assets
  • lnsales = natural log of sales
  • ln_age = natural log of firm age
  • listed = an indicator variable for publicly listed firms

3

4 of 24

Example #1

  • lnaf = a0 + a1 lnta + a2 lnsales + u
  • lnta = b0 + b1 ln_sales + b2 ln_age + v

  • Note the exclusion restriction on ln_age
  • If this system is estimated using IV, it would be “just-identified” because the lnaf equation has one endogenous regressor (lnta) and one exclusion restriction (ln_age)
  • The error terms are assumed uncorrelated

4

5 of 24

Code to Estimate Example�(Both OLS and SEM regressions)

  • use "D:\Phd\Fees1.dta", clear
  • gen age= year-incorporationyear
  • gen ln_age=ln(age)
  • gen listed=0
  • replace listed=1 if companytype==2 | companytype==3 | companytype==5
  • egen miss=rmiss(lnaf lnta lnsales ln_age listed)
  • regress lnaf lnta lnsales if miss==0
  • regress lnta lnsales ln_age if miss==0
  • sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age) if miss==0

5

6 of 24

OLS

6

7 of 24

SEM

7

8 of 24

Example #1

  • Note that the OLS coefficients are exactly the same as the SEM coefficients

8

9 of 24

Example #2

  • The OLS and SEM coefficients continue to be the same if the system is over-identified when assuming uncorrelated errors
    • lnaf = a0 + a1 lnta + a2 lnsales + u
    • lnta = b0 + b1 ln_sales + b2 ln_age + b3 listed + v
  • Note the two exclusion restrictions on ln_age and listed
  • This system is “over-identified” because the lnaf equation has one endogenous regressor and two exclusion restrictions
    • regress lnaf lnta lnsales if miss==0
    • sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age listed) if miss==0

9

10 of 24

10

11 of 24

Example #3

    • lnaf = a0 + a1 lnta + a2 lnsales + u
    • lnta = b0 + b1 ln_sales + v

  • This system is “under-identified” in IV estimation because the lnaf equation has one endogenous regressor (lnaf) but no exclusion restrictions.

  • However, it can still be estimated using path analysis (regress or sem) if one assumes uncorrelated errors (cov u v) = 0).

11

12 of 24

Example #3

    • lnaf = a0 + a1 lnta + a2 lnsales + u
    • lnta = b0 + b1 ln_sales + v

  • Assuming uncorrelated errors is equivalent to assuming away the endogeneity problem

  • Thus, SEM gives the same output as OLS

    • regress lnaf lnta lnsales if miss==0
    • sem (lnaf <- lnta lnsales) (lnta <- lnsales) if miss==0

12

13 of 24

13

14 of 24

Summary

  • In each of the previous examples, the researcher assumed uncorrelated errors.
    • When assuming uncorrelated errors, the estimated coefficients from path analysis are identical to OLS (see previous examples).

  • There are two other alternative scenarios that may happen.

14

15 of 24

Alternative Scenario #1

  • 1) The researcher allows correlated errors in a just-identified system.
    • The estimated coefficients from path analysis are identical to the coefficients from IV estimation

    • ivregress 2sls lnaf lnsales (lnta = lnsales ln_age) if miss==0
    • ivregress 2sls lnaf lnsales (lnta = lnsales ln_age) if miss==0
    • ivregress 2sls lnaf lnsales (lnta = lnsales ln_age) if miss==0
    • sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age) if miss==0, cov(e.lnaf*e.lnta)

15

16 of 24

16

17 of 24

17

18 of 24

Alternative Scenario #2

  • 2) The researcher allows correlated errors in an over-identified system.
    • The estimated coefficients from path analysis are different from IV but the IV estimates are also different depending on whether the researcher uses 2SLS, LIML or GMM

    • sem (lnaf <- lnta lnsales) (lnta <- lnsales ln_age listed) if miss==0, cov(e.lnaf*e.lnta)
    • ivregress 2sls lnaf lnsales (lnta = lnsales ln_age listed) if miss==0
    • ivregress liml lnaf lnsales (lnta = lnsales ln_age listed) if miss==0
    • ivregress gmm lnaf lnsales (lnta = lnsales ln_age listed) if miss==0
    • reg lnta lnsales ln_age listed if miss==0

18

19 of 24

19

20 of 24

20

21 of 24

Over-identified systems

  • The previous example shows that, in an over-identified system with correlated errors, the coefficients are different between 2SLS, LIML, FIML, or GMM
  • Why is that?

21

22 of 24

LIML vs FIML

  • LIML and FIML are both maximum likelihood estimators.
  • Under LIML each equation in the system is estimated individually, whereas under FIML the equations are estimated jointly.
  • Thus, in LIML, the over-identifying restrictions in the other equation are not considered when estimating the coefficients of the equation with the endogenous mediator.
  • In FIML, the over-identifying restrictions are taken into account when estimating the system.
  • Imposing an additional exclusion restriction allows FIML to use more information from which to generate the coefficient estimates.
  • Consequently, FIML estimates are different from, and asymptotically more efficient than, the coefficient estimates from LIML.
  • Despite this advantage, some researchers prefer LIML to FIML because the median LIML estimate is close to unbiased even when the chosen instruments (W and Z) are weak.

22

23 of 24

2SLS vs LIML

  •  

23

24 of 24

GMM

  • GMM is a separate class of estimator based on moment functions.
  • In just-identified systems, GMM has moment conditions that exactly align to those of 2SLS.
  • Therefore, in just-identified systems, the GMM coefficients are identical to 2SLS (as well as LIML and FIML).
  • In over-identified systems, GMM relies on a weighting matrix to generate coefficient estimates, with the matrix also being estimated. Consequently, the GMM coefficients are different from other estimation methods (2SLS, LIML, FIML) when the system of equations is over-identified.

24