1 of 81

Jiyong Park

Bryan School of Business and Economics

University of North Carolina at Greensboro

jiyong.park@uncg.edu

Instrumental Variable and Regression Discontinuity

1

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Session Website: https://sites.google.com/view/causal-inference2022

Boot Camp for Beginners

Instrumental Variable and Regression Discontinuity

2 of 81

Instrumental Variable and Regression Discontinuity

2

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Instrumental Variable

3 of 81

Instrumental Variable and Regression Discontinuity

3

Korea Summer Workshop on Causal Inference 2022

Causal Hierarchy from the Perspective of Potential Outcomes

Level of Causal Inference

Meta-Analysis

Randomized Controlled Trial

Quasi-Experiment

Instrumental Variable

“Designed” Regression/Matching

(based on causal knowledge or theory)

Model-Free Descriptive Statistics (no causal inference)

Exploiting Research Design

Regression/Matching (little causal inference)

Selection on Unobservables Strategies

Selection on Observables Strategies

Exploiting

Random Assignment

4 of 81

Instrumental Variable and Regression Discontinuity

4

Korea Summer Workshop on Causal Inference 2022

What’s Your Research Design and Data Structure?

Treatment and control groups are observed.

Parallel trend assumption is valid.

Treatment is assigned by arbitrary threshold.

There is an exogenous variable that can induce the treatment.

No

Local Average Treatment Effect

Yes

No

DID + Matching

Synthetic Control

Regression Discontinuity

Quasi-Experiments

The goal is causal inference, and random assignment is feasible.

Matching/

Weighting

There is a single treatment group and no information about functional forms.

Regression

Selection on Observables

No

Yes

No

Yes

No

Yes

Randomized Controlled Trial

No

Longitudinal data is observed.

Interrupted Time-Series Analysis

Yes

No

Longitudinal data is observed.

Not feasible

No

Yes

Yes

Difference-in-Differences (DID)

Yes

or

No

Yes

Control Function & Selection Model

or

Instrumental Variable

Quasi-experimental designs are available.

Note that the flowchart may depend on the context.

5 of 81

Instrumental Variable and Regression Discontinuity

5

Korea Summer Workshop on Causal Inference 2022

What’s Your Research Design and Data Structure?

Treatment and control groups are observed.

Parallel trend assumption is valid.

Treatment is assigned by arbitrary threshold.

There is an exogenous variable that can induce the treatment.

No

Local Average Treatment Effect

Yes

No

DID + Matching

Synthetic Control

Regression Discontinuity

Quasi-Experiments

The goal is causal inference, and random assignment is feasible.

Matching/

Weighting

There is a single treatment group and no information about functional forms.

Regression

Selection on Observables

No

Yes

No

Yes

No

Yes

Randomized Controlled Trial

No

Longitudinal data is observed.

Interrupted Time-Series Analysis

Yes

No

Longitudinal data is observed.

Not feasible

No

Yes

Yes

Difference-in-Differences (DID)

Yes

or

No

Yes

Control Function & Selection Model

or

Instrumental Variable

Quasi-experimental designs are available.

Note that the flowchart may depend on the context.

6 of 81

Instrumental Variable and Regression Discontinuity

6

Korea Summer Workshop on Causal Inference 2022

Endogeneity in Regression

 

 

Captured in the error term

 

 

Selection Bias

Identification assumption for regression

: Conditional independence (given controls C)

 

7 of 81

Instrumental Variable and Regression Discontinuity

7

Korea Summer Workshop on Causal Inference 2022

Endogeneity in Regression

  • To interpret the regression in a causal manner, the treatment variable should not be correlated with the error term capturing all unobserved factors that could influence the outcome variable.

Exogenous (assumption for causal inference)

Outcome Variable (Effect)

Treatment Variable (Cause)

Error Term

(Unobserved Factors)

8 of 81

Instrumental Variable and Regression Discontinuity

8

Korea Summer Workshop on Causal Inference 2022

Taking Endogeneity Out: Instrumental Variable

  • Instrumental variable (IV) is an instrument to separate the exogenous portion of the treatment variable from the endogenous portion (selection bias).

Error Term

Outcome Variable

Variable

Endogenous

Treatment

Instrumental Variable

Exogenous

Variation explained by IVs

Variation not explained by IVs

9 of 81

Instrumental Variable and Regression Discontinuity

9

Korea Summer Workshop on Causal Inference 2022

Taking Endogeneity Out: Instrumental Variable

  • Instrumental variable (IV) is an instrument to separate the exogenous portion of the treatment variable from the endogenous portion (selection bias).

Error Term

Outcome Variable

Variable

Endogenous

Treatment

Instrumental Variable

Exogenous

Variation explained by IVs

Variation not explained by IVs

First-Stage

Second-Stage

10 of 81

Instrumental Variable and Regression Discontinuity

10

Korea Summer Workshop on Causal Inference 2022

Identification Assumptions for IV

  • (1) The IVs should be correlated with the endogenous treatment variable (relevance).
  • (2) The IVs should not be correlated with the error term in the explanatory equation.
    • Exclusion restriction: The IVs do not affect the outcome except through the treatment variable.
    • Exogeneity of IV: The IVs do not share any confounders with the outcome.

11 of 81

Instrumental Variable and Regression Discontinuity

11

Korea Summer Workshop on Causal Inference 2022

First Approach: Two-Stage Least Squares

  • Let’s take the endogenous portion out of the treatment variable and use only the exogenous portion.

Outcome Variable

원인 변수

Predicted Treatment Variable

Instrumental Variable

Error Term

Exogenous

First-Stage

Variation explained by IVs

Second-Stage

12 of 81

Instrumental Variable and Regression Discontinuity

12

Korea Summer Workshop on Causal Inference 2022

First Approach: Two-Stage Least Squares

 

 

(endogenous treatment variable)

13 of 81

Instrumental Variable and Regression Discontinuity

13

Korea Summer Workshop on Causal Inference 2022

First Approach: Two-Stage Least Squares

 

 

 

(endogenous treatment variable)

(relevance of IVs)

(exclusion restriction/ exogeneity of IVs)

14 of 81

Instrumental Variable and Regression Discontinuity

14

Korea Summer Workshop on Causal Inference 2022

First Approach: Two-Stage Least Squares

 

 

 

(endogenous treatment variable)

(relevance of IVs)

(exclusion restriction/ exogeneity of IVs)

15 of 81

Instrumental Variable and Regression Discontinuity

15

Korea Summer Workshop on Causal Inference 2022

First Approach: Two-Stage Least Squares

 

 

 

(endogenous treatment variable)

(relevance of IVs)

(exclusion restriction/ exogeneity of IVs)

16 of 81

Instrumental Variable and Regression Discontinuity

16

Korea Summer Workshop on Causal Inference 2022

IV Example (1) Exogenous Event-based IVs (Ideal)

  • Settler Mortality during 1500-1900
  • Population Density in 1500

English Ex-Colonies

(English Legal Origin)

GDP in 1990s

Property Rights Institutions in 1990s

Contracting Institutions in 1990s

Acemoglu, D., Johnson, S. and Robinson, J.A., 2001. The Colonial Origins of Comparative Development: An Empirical Investigation. American Economic Review, 91(5), pp.1369-1401.

Acemoglu, D. and Johnson, S., 2005. Unbundling Institutions. 

Journal of Political Economy113(5), pp.949-995.

Unobserved Factors

17 of 81

Instrumental Variable and Regression Discontinuity

17

Korea Summer Workshop on Causal Inference 2022

IV Example (2) Regional IVs

Terrain Slope

Number of Cell Towers in the Shopper’s Area

Online and Offline Purchases

Broadband Internet Penetration

Mobile App Adoption

Hate Crime

Narang, U. and Shankar, V., 2019. Mobile app introduction and online and offline purchases and product returns. Marketing Science38(5), pp.756-772.

Chan, J., Ghose, A. and Seamans, R., 2016. The internet and racial hate crime: offline spillovers from online access. MIS Quarterly40(2), pp.381-403.

18 of 81

Instrumental Variable and Regression Discontinuity

18

Korea Summer Workshop on Causal Inference 2022

IV Example (3) Geographical Proximity-Based IVs

Distance from Walldorf, Germany, where the SAP headquarters are located

Distance from Luther City Wittenberg

Economic Outcomes

ERP Adoption

Spread of Protestantism in Germany

Span of Control

Becker, S.O. and Woessmann, L., 2009. Was Weber wrong? A human capital theory of Protestant economic history. Quarterly Journal of Economics, 124(2), pp.531-596.

Bloom, N., Garicano, L., Sadun, R. and Van Reenen, J., 2014. The distinct effects of information technology and communication technology on firm organization. Management Science, 60(12), pp.2859-2885.

19 of 81

Instrumental Variable and Regression Discontinuity

19

Korea Summer Workshop on Causal Inference 2022

IV Example (4) Macro/Cohort Trends as IVs

Nationwide Industry-Level Employment Growth

5G Adoption in the Same Cohort/Region

Loyalty Point Redemption

County-Level

Unemployment Rate

Mobile Loyalty Program Adoption

Labor Supply in Online Labor Markets

Son, Y., Oh, W., Han, S.P. and Park, S., 2020. When loyalty goes mobile: Effects of mobile loyalty apps on purchase, redemption, and competition. Information Systems Research, 31(3), pp.835-847.

Huang, N., Burtch, G., Hong, Y. and Pavlou, P.A., 2020. Unemployment and worker participation in the gig economy: Evidence from an online labor market. Information Systems Research, 31(2), pp.431-448.

(weighted by county-level industry composition)

20 of 81

Instrumental Variable and Regression Discontinuity

20

Korea Summer Workshop on Causal Inference 2022

IV Example (5) Peers’ Environments as IVs

Weather in Friends’ Areas

Check-Ins of Friends’ Friends

User’s Visit Frequency

Friends’ Running Behavior

Friends’ Check-Ins

User’s Running Behavior

Qiu, L., Shi, Z. and Whinston, A.B., 2018. Learning from your friends’ check-ins: An empirical study of location-based social networks. Information Systems Research, 29(4), pp.1044-1061.

Aral, S. and Nicolaides, C., 2017. Exercise contagion in a global social network. Nature Communications, 8(1), pp.1-8.

21 of 81

Instrumental Variable and Regression Discontinuity

21

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Local Average Treatment Effect (LATE)

22 of 81

Instrumental Variable and Regression Discontinuity

22

Korea Summer Workshop on Causal Inference 2022

IV from the Perspective of Potential Outcome

Two-Stage Least Squares (2SLS)

Potential Outcome Framework

How Does IV Estimate (2SLS) Have Causal Interpretation?

23 of 81

Instrumental Variable and Regression Discontinuity

23

Korea Summer Workshop on Causal Inference 2022

IV from the Perspective of Potential Outcome

  • Local average treatment effect (LATE) incorporates the IV framework into the potential outcome framework.
  • Local average treatment effect (LATE) allows us to interpret an IV analysis as a quasi-experiment.
  • Local average treatment effect (LATE) clarifies what we can learn from the IV analysis (2SLS).

Two-Stage Least Squares (2SLS)

Potential Outcome Framework

24 of 81

Instrumental Variable and Regression Discontinuity

24

Korea Summer Workshop on Causal Inference 2022

IV from the Perspective of Potential Outcome

  • Local average treatment effect (LATE) incorporates the IV framework into the potential outcome framework.
  • Local average treatment effect (LATE) allows us to interpret an IV analysis as a quasi-experiment.
  • Local average treatment effect (LATE) clarifies what we can learn from the IV analysis (2SLS).

Two-Stage Least Squares (2SLS)

Potential Outcome Framework

“Angrist and Imbens showed that even in this general setting it is possible to estimate a well-defined treatment effect — the local average treatment effect (LATE) — under a set of minimal (and in many cases empirically plausible) conditions. In deriving their key results, they merged the instrumental variables (IV) framework, common in economics, with the potential-outcomes framework for causal inference, common in statistics. Within this framework, they clarified the core identifying assumptions in a causal design and provided a transparent way of investigating the sensitivity to violations of these assumptions”

- Scientific Background on the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2021

25 of 81

Instrumental Variable and Regression Discontinuity

25

Korea Summer Workshop on Causal Inference 2022

IV as a Treatment Assignment Mechanism

  • Under the monotonicity assumption (no defiers), IV estimand is average treatment effect on the compliers who received the treatment induced by IV.

IV

Treatment

Local Average Treatment Effect (LATE)

Monotonicity assumption

Imbens, G.W. and Angrist, J.D., 1994. Identification and Estimation of Local Average Treatment Effects. Econometrica62(2), pp.467-475.

Angrist, J.D., Imbens, G.W. and Rubin, D.B., 1996. Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association91(434), pp.444-455.

26 of 81

Instrumental Variable and Regression Discontinuity

26

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • Angrist (1990) estimated the effect of serving in the military on earnings.

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

Draft lottery tied to an individual’s day of birth

Earnings

Serving in the military

(Veteran)

Instrumental Variable (Z)

Treatment (W)

27 of 81

Instrumental Variable and Regression Discontinuity

27

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • Simple comparison between veterans and those not suffers from selection bias.

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

 

OLS estimate

28 of 81

Instrumental Variable and Regression Discontinuity

28

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • What can we learn from the IV estimates?

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

 

2SLS estimate

29 of 81

Instrumental Variable and Regression Discontinuity

29

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • What can we learn from the IV estimates?

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

 

2SLS estimate

30 of 81

Instrumental Variable and Regression Discontinuity

30

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • What can we learn from the IV estimates?

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

31 of 81

Instrumental Variable and Regression Discontinuity

31

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • What can we learn from the IV estimates?

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

32 of 81

Instrumental Variable and Regression Discontinuity

32

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • What can we learn from the IV estimates?

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

2SLS estimate

33 of 81

Instrumental Variable and Regression Discontinuity

33

Korea Summer Workshop on Causal Inference 2022

Illustrative Example of LATE

  • What can we learn from the IV estimates?

Angrist, J.D., 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.  American Economic Review, pp.313-336.

2SLS estimate

What we can learn from the IV estimates is

the causal effect in the subpopulation of compliers.

CAVEAT:

IV estimates are specific to focal IVs because compliers are defined as specific to them.

In general, IV estimates are not generalizable to other contexts.

34 of 81

Instrumental Variable and Regression Discontinuity

34

Korea Summer Workshop on Causal Inference 2022

[Appendix] Local Average Treatment Effect

 

 

Z: Instrument (0 or 1), D: Treatment (0 or 1), Y: Outcome

Stable Unit Treatment Value Assumption & Exclusion Restriction

Monotonicity

 

IV Estimand

Local Average Treatment Effect

Monotonicity & Relevance

Compliers

Z → D(Z) → Y(Z, D)

Potential Outcomes

35 of 81

Instrumental Variable and Regression Discontinuity

35

Korea Summer Workshop on Causal Inference 2022

When LATE Becomes ATET and ATE

    • [One-sided non-compliance 1] If there are no always-takers, then LATE is ATET.
      • Example: A randomized controlled trial in which some people who were assigned for treatment do not take the treatment (never-takers), but nobody in the control group has access to the treatment (no always-takers).
    • [One-sided non-compliance 2] If there are no never-takers, then LATE is ATEU.
    • [No non-compliance] If there are no always-takers and never-takers, then LATE is ATE.
      • For ATET = ATE, treatment and control groups should be comparable, except the treatment.

ATET = LATE

ATE = LATE

36 of 81

Instrumental Variable and Regression Discontinuity

36

Korea Summer Workshop on Causal Inference 2022

Application of LATE (1) Policy-Relevant Effect

    • If the instrument is a policy of interest, then LATE is the effect on individuals who are shifted into treatment by the policy, and thus LATE is equal to a policy-relevant effect.
    • Example: Effect of the increase in compulsory-school leaving age (from 14 to 15)
      • LATE estimates the effect for people who only stay in school longer due to policy reform and would have left at 14.
      • With no never-takers due to full enforcement, LATE = ATEU.
      • With always-takers, LATE ≠ ATET ≠ ATE.

Oreopoulos, P., 2006. Estimating average and local average treatment effects of education when compulsory schooling laws really matter. American Economic Review96(1), pp.152-175.

Increase in compulsory-school leaving age

Outcome

Extra year of schooling

Instrumental Variable

37 of 81

Instrumental Variable and Regression Discontinuity

37

Korea Summer Workshop on Causal Inference 2022

Application of LATE (2) Imperfect Compliance in RCTs

    • Intention-to-Treat (ITT) effect
      • ITT is based on the initial treatment assignment and not the treatment eventually received.

“During the experiment, the percentage of treatment group working at home hovered between 80% and 90%. Since compliance was imperfect, our estimators take even birthdate status as the treatment status, yielding an intention-to-treat result on the eligible volunteers.” (p. 183)

Bloom, N., Liang, J., Roberts, J. and Ying, Z.J., 2014. Does Working from Home Work? Evidence from a Chinese Experiment. Quarterly Journal of Economics, 130(1), pp.165-218.

Treatment assignment

Productivity

Actual treatment (Work-from-home)

Instrumental Variable

    • By considering treatment assignment as IV, LATE can estimate the actual causal effect of work-from-home for the compliers.
      • With no always-takers, LATE = ATET.
      • With never-takers, LATE ≠ ATEU ≠ ATE.

38 of 81

Instrumental Variable and Regression Discontinuity

38

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Regression Discontinuity

39 of 81

Instrumental Variable and Regression Discontinuity

39

Korea Summer Workshop on Causal Inference 2022

What’s Your Research Design and Data Structure?

Treatment and control groups are observed.

Parallel trend assumption is valid.

Treatment is assigned by arbitrary threshold.

There is an exogenous variable that can induce the treatment.

No

Local Average Treatment Effect

Yes

No

DID + Matching

Synthetic Control

Regression Discontinuity

Quasi-Experiments

The goal is causal inference, and random assignment is feasible.

Matching/

Weighting

There is a single treatment group and no information about functional forms.

Regression

Selection on Observables

No

Yes

No

Yes

No

Yes

Randomized Controlled Trial

No

Longitudinal data is observed.

Interrupted Time-Series Analysis

Yes

No

Longitudinal data is observed.

Not feasible

No

Yes

Yes

Difference-in-Differences (DID)

Yes

or

No

Yes

Control Function & Selection Model

or

Instrumental Variable

Quasi-experimental designs are available.

Note that the flowchart may depend on the context.

40 of 81

Instrumental Variable and Regression Discontinuity

40

Korea Summer Workshop on Causal Inference 2022

Regression Discontinuity (RD)

  • Regression discontinuity is to identify the local discontinuous jump on a running (assignment) variable.
  • The estimation of treatment effect in RD depends on extrapolation based on a running variable (dashed line).

Running variable

Outcome variable

Running variable

Outcome variable

discontinuous jump

Counterfactual

Treated

Untreated

41 of 81

Instrumental Variable and Regression Discontinuity

41

Korea Summer Workshop on Causal Inference 2022

Example of Discontinuity

  • The distinction between treatment and control groups should be an arbitrary threshold or cutoff, independent of the outcome and other confounders.

Legal drinking age is arbitrarily determined by law.

  • Example: Drinking and public health/death (Carpenter and Dobkin 2011)

Carpenter, C. and Dobkin, C., 2011. The Minimum Legal Drinking Age and Public Health. Journal of Economic Perspectives25(2), pp.133-56.

(Treatment)

42 of 81

Instrumental Variable and Regression Discontinuity

42

Korea Summer Workshop on Causal Inference 2022

RD Estimation Strategies

  • How to extrapolate the counterfactual after the discontinuity

Bandwidth

Local

Global

Modeling of

Running Variable

Parametric

Local Parametric

(Local Regression)

Global Parametric

(Global Regression)

Nonparametric

Local Nonparametric

(Local Experiment)

Global Experiment

(Global Experiment)

The narrower bandwidth, The smaller the sample.

The broader bandwidth,

The more vulnerable to selection bias

43 of 81

Instrumental Variable and Regression Discontinuity

43

Korea Summer Workshop on Causal Inference 2022

RD Estimation Strategies

  • How to extrapolate the counterfactual after the discontinuity

Global Parametric (Linear)

Local Parametric (Linear)

Global Nonparametric

Local Nonparametric

44 of 81

Instrumental Variable and Regression Discontinuity

44

Korea Summer Workshop on Causal Inference 2022

RD Estimation Strategies

Jacob, R., Zhu, P., Somers, M.A. and Bloom, H., 2012. A Practical Guide to Regression Discontinuity. MDRC. (https://eric.ed.gov/?id=ED565862)

Global Parametric

(Linear)

Local Parametric (Linear)

Local Nonparametric

(Binary)

45 of 81

Instrumental Variable and Regression Discontinuity

45

Korea Summer Workshop on Causal Inference 2022

RD Estimation Strategies

  • If the treatment and control groups split by discontinuity are observed before and after the treatment (e.g., policy enactment), it may be better suited for DID.
  • Example: Effect of usage restriction law on online game usage and spending (Jo et al. 2020)

Jo, W., Sunder, S., Choi, J. and Trivedi, M., 2020. Protecting consumers from themselves: Assessing consequences of usage restriction laws on online game usage and spending. Marketing Science, 39(1), pp.117-133.

“To this end, we find a three-year bandwidth both below and above 16 years of age to be appropriate in that it affords a large enough sample size to make statistical inferences, resulting in the RD sample including 299 and 387 gamers in the treatment and control groups, respectively. We then run the DID

models using the RD sample.

46 of 81

Instrumental Variable and Regression Discontinuity

46

Korea Summer Workshop on Causal Inference 2022

Identification Assumption for RD

  • [Nonparametric] Ceteris paribus in the local sample around the threshold within the bandwidth
  • [Parametric] Functional form of running variable in the absence of the treatment within the bandwidth

The consequences of using an incorrect functional form are more serious in the case of RD designs however, since misspecification of the functional form typically generates a bias in the treatment effect.”

(Lee and Lemieux 2010; p. 316)

Lee, D.S. and Lemieux, T., 2010. Regression Discontinuity Designs in Economics. Journal of Economic Literature48(2), pp.281-355.

Untreated

Treated

47 of 81

Instrumental Variable and Regression Discontinuity

47

Korea Summer Workshop on Causal Inference 2022

Identification Assumption for RD

  • Various polynomial regressions can be used to model the relationship between outcome and running variable.

Jacob, R., Zhu, P., Somers, M.A. and Bloom, H., 2012. A Practical Guide to Regression Discontinuity. MDRC. (https://eric.ed.gov/?id=ED565862)

48 of 81

Instrumental Variable and Regression Discontinuity

48

Korea Summer Workshop on Causal Inference 2022

Identification Assumption for RD

  • Various polynomial regressions can be used to model the relationship between outcome and running variable.

49 of 81

Instrumental Variable and Regression Discontinuity

49

Korea Summer Workshop on Causal Inference 2022

Example of Regression Discontinuity

  • Example: Effect of labor unions on product quality failures (Kini et al. 2021)
    • Running variable: Percentage of votes in favor of the union

“To circumvent the endogenous nature of union election results, we employ a regression discontinuity design (RDD) methodology, which compares firms with close union victories to firms with close union losses.”

[Strategy 1 – Global/Local Parametric / Binary]

“Specifically, in our main test, we estimate the following Poisson regression model using only close elections:”

“In Panel C, we estimate local regressions based on the optimal bandwidth as in Imbens and Kalyanaraman (2012).”

Kini, O., Shen, M., Shenoy, J. and Subramaniam, V., 2021. Labor unions and product quality failures. Management Science. forthcoming

Discontinuity

Imbens, G. and Kalyanaraman, K., 2012. Optimal bandwidth choice for the regression discontinuity estimator. 

Review of Economic Studies79(3), pp.933-959.

Full Sample (Global)

Subsample (Local)

50 of 81

Instrumental Variable and Regression Discontinuity

50

Korea Summer Workshop on Causal Inference 2022

Example of Regression Discontinuity

  • Example: Effect of labor unions on product quality failures (Kini et al. 2021)
    • Running variable: Percentage of votes in favor of the union

“To circumvent the endogenous nature of union election results, we employ a regression discontinuity design (RDD) methodology, which compares firms with close union victories to firms with close union losses.”

[Strategy 2 – Global Parametric / Linear & Quadratic]

“To provide more efficient estimates, we also use all union elections for our sample firms and approximate the continuous relation between the Frequency of recalls and pv [percentage of votes in favor of the union] by including a polynomial in pv while, at the same time, allowing for a discontinuous jump at the union win threshold of 50% (c).”

Kini, O., Shen, M., Shenoy, J. and Subramaniam, V., 2021. Labor unions and product quality failures. Management Science. forthcoming

Function of running variable

Discontinuity

51 of 81

Instrumental Variable and Regression Discontinuity

51

Korea Summer Workshop on Causal Inference 2022

Imperfect Compliance: Fuzzy RD

  • Sharp RD vs. Fuzzy RD
    • Depending on the probability of being treated based on discontinuity

52 of 81

Instrumental Variable and Regression Discontinuity

52

Korea Summer Workshop on Causal Inference 2022

Imperfect Compliance: Fuzzy RD

  • Sharp RD vs. Fuzzy RD
    • With the cutoff on running variable as IV, local average treatment effect (LATE) can be estimated for fuzzy RD.
    • Sharp RD is considered LATE without always-takers and never-takers.

Above cutoff

Outcome

Treatment

Instrumental Variable

LATE = Causal effect of the treatment for those who received the treatment induced by the discontinuity

Global/Local

Parametric/Nonparametric

53 of 81

Instrumental Variable and Regression Discontinuity

53

Korea Summer Workshop on Causal Inference 2022

Imperfect Compliance: Fuzzy RD

  • Example: Drinking and public health/death (Carpenter and Dobkin 2011)
    • Running variable: Age

Above 21

Death rate

Drinking

Instrumental Variable

LATE = Causal effect of drinking for those who only drink after 21 and wouldn’t have.

Carpenter, C. and Dobkin, C., 2011. The Minimum Legal Drinking Age and Public Health. Journal of Economic Perspectives25(2), pp.133-56.

Global/Local

Parametric/Nonparametric

54 of 81

Instrumental Variable and Regression Discontinuity

54

Korea Summer Workshop on Causal Inference 2022

Imperfect Compliance: Fuzzy RD

  • Example: Oversight and efficiency in public projects (Calvo et al. 2019)
    • Running variable: Budgets

Calvo, E., Cui, R. and Serpa, J.C., 2019. Oversight and efficiency in public projects: A regression discontinuity analysis. Management Science65(12), pp.5651-5675.

Above budget cutoff (discontinuity)

Delay / Overrun

High oversight regime

Discontinuity as IV

First-Stage: Probability of treatment

Oversight

Second-Stage: LATE

Predicted value from the first stage

Instrumental Variable

Global

Parametric

55 of 81

Instrumental Variable and Regression Discontinuity

55

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Control Function

: Selection Bias Correction Method

56 of 81

Instrumental Variable and Regression Discontinuity

56

Korea Summer Workshop on Causal Inference 2022

Causal Inference ≅ How to Address Endogeneity

  • Various approaches in causal inference depending on how to address endogeneity

Treatment Group with Grant

Control Group without Grant

1. Research Design for Causal Inference

    • Randomized Controlled Trial
    • (Natural) Quasi-Experiment
    • Local Average Treatment Effect (LATE)

2. Selection Model (Statistical Modeling)

3. Causal Graph (Graphical Modeling)

Causal effect of grant!

57 of 81

Instrumental Variable and Regression Discontinuity

57

Korea Summer Workshop on Causal Inference 2022

Second Approach: Control Function

  • Let’s control for the endogenous portion of the error term, predicted by the endogenous portion of the treatment variable (residual).

Error Term

Outcome Variable

Predicted Residual

Treatment

Variable

Instrumental Variable

First-Stage

Variation not explained by IVs

Exogenous

conditional on the residual

If the predicted residual represents the probability of being selected as the treatment group or as the sample, it is called a selection model.

Second-Stage

58 of 81

Instrumental Variable and Regression Discontinuity

58

Korea Summer Workshop on Causal Inference 2022

Second Approach: Control Function

 

 

 

(endogenous treatment variable)

(relevance of IVs)

(exclusion restriction/ exogeneity of IVs)

59 of 81

Instrumental Variable and Regression Discontinuity

59

Korea Summer Workshop on Causal Inference 2022

Second Approach: Control Function

 

 

 

(endogenous treatment variable)

(relevance of IVs)

(exclusion restriction/ exogeneity of IVs)

60 of 81

Instrumental Variable and Regression Discontinuity

60

Korea Summer Workshop on Causal Inference 2022

Second Approach: Control Function

 

 

 

(endogenous treatment variable)

(relevance of IVs)

 

(exclusion restriction/ exogeneity of IVs)

61 of 81

Instrumental Variable and Regression Discontinuity

61

Korea Summer Workshop on Causal Inference 2022

Two-Stage Least Squares vs. Control Function

Two-Stage Least Squares

Control Function

Common Requirement

Instrumental variables (IVs) are required.

Causal Foundation

Potential Outcome Framework

(Local Average Treatment Effect)

Controlling for Endogeneity

Pros

  • Causal interpretations are ensured, at least for the compliers.
  • Original variable can be maintained while controlling for endogeneity.
  • The model can be extended more flexibly to account for various forms of selection.

Cons

  • Under the monotonicity assumption, “local” causal effect can be estimated at best for the compliers.
  • Estimation is less efficient for non-linear models.
  • Results are vulnerable to wrong functional forms.
  • Assumptions are more difficult to be validated.

62 of 81

Instrumental Variable and Regression Discontinuity

62

Korea Summer Workshop on Causal Inference 2022

Example (1) Effects of Previews/Reviews on E-Book Purchase

Choi, A.A., Cho, D., Yim, D., Moon, J.Y. and Oh, W., 2019. When seeing helps believing: The interactive effects of previews and reviews on e-book purchases.Information Systems Research30(4), pp.1164-1183.

 

“naïve estimation may engender overstated effects from these endogenous variables.”

63 of 81

Instrumental Variable and Regression Discontinuity

63

Korea Summer Workshop on Causal Inference 2022

Example (2) Effect of Advertising on Sales

  • Control function approaches are flexible enough to account for endogeneity of the effect of endogenous variables (“slope endogeneity”).

“For example, in a simple linear sales response function, S = α – βP, where P is price and S is sales, extant research assumes that econometrically unobserved factors affect the demand level linearly (i.e., intercept α) but not marketing-mix responsiveness (i.e., price coefficient β)…

A supermarket chain might charge a higher price in markets in response to econometrically unobserved higher preferences for the chain (captured by α) in such markets (i.e., “intercept endogeneity”), but the chain manager’s private information about the lower price sensitivity of a market (captured by β) might also lead to a higher-than-expected price (i.e., “slope endogeneity”).” (Luan and Sudhir 2010, p. 445)

 

Luan, Y.J. and Sudhir, K., 2010. Forecasting marketing-mix responsiveness for new products. Journal of Marketing Research47(3), pp.444-457.

64 of 81

Instrumental Variable and Regression Discontinuity

64

Korea Summer Workshop on Causal Inference 2022

Example (2) Effect of Advertising on Sales

Step 1. Predict the residual of the endogenous variable

Step 2. Include the residual as well as the interaction between the residual and the endogenous variable.

Luan, Y.J. and Sudhir, K., 2010. Forecasting marketing-mix responsiveness for new products. Journal of Marketing Research47(3), pp.444-457.

65 of 81

Instrumental Variable and Regression Discontinuity

65

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Selection Model

: A Special Case of Control Function

66 of 81

Instrumental Variable and Regression Discontinuity

66

Korea Summer Workshop on Causal Inference 2022

Short Statistics for Heckman Selection Model

 

 

 

(symmetry of the standard normal distribution)

(cumulative distribution function of the standard normal distribution)

67 of 81

Instrumental Variable and Regression Discontinuity

67

Korea Summer Workshop on Causal Inference 2022

Short Statistics for Heckman Selection Model

 

 

 

(symmetry of the standard normal distribution)

(cumulative distribution function of the standard normal distribution)

(Inverse Mills Ratio)

(can be derived from the formula)

density function

cumulative distribution function

normal distribution

68 of 81

Instrumental Variable and Regression Discontinuity

68

Korea Summer Workshop on Causal Inference 2022

Heckman Selection Model: Special Case of Control Function

 

Example: Education (X) and Wage (Y)

→ Controlling for probability residual of being selected as the sample

We can observe those who are employed with relatively higher wages, possibly underestimating the effect of education on wages.

69 of 81

Instrumental Variable and Regression Discontinuity

69

Korea Summer Workshop on Causal Inference 2022

Heckman Selection Model: Special Case of Control Function

 

→ Controlling for probability residual of being selected as the treatment group

→ This case can also be estimated as a residual inclusion method (as described in Page 23).

70 of 81

Instrumental Variable and Regression Discontinuity

70

Korea Summer Workshop on Causal Inference 2022

Heckman Selection Model: Special Case of Control Function

 

First-Stage

Probit model

→ Inverse Mills ratio can be computed from the probit model.

Inverse Mills ratio is additionally inserted into the original equation.

 

(Heckman Selection Model can be explained as a switching regression model.)

71 of 81

Instrumental Variable and Regression Discontinuity

71

Korea Summer Workshop on Causal Inference 2022

Example (1) Effect of Education on Wage

Caution! Are the variables, married and children, valid instrumental variables?

use http://www.stata-press.com/data/r13/gsem_womenwk

heckman wage educ age, select(married children educ age) twostep

  • Heckman Selection Model (1)
    • Controlling for probability residual of being selected as the sample

 

coefficient of IMR

Instrumental Variables

72 of 81

Instrumental Variable and Regression Discontinuity

72

Korea Summer Workshop on Causal Inference 2022

Example (2) Effect of Diversification on Firm Value

Campa, J.M. and Kedia, S., 2002. Explaining the diversification discount. Journal of Finance57(4), pp.1731-1762.

(Inverse Mills Ratio)

(Campa and Kedia 2002, p. 1750)

coefficient of IMR

T-statistics are given in parentheses

  • Heckman Selection Model (2)
    • Controlling for probability residual of being selected as the treatment group

73 of 81

Instrumental Variable and Regression Discontinuity

73

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Practical Tips for Using IVs

74 of 81

Instrumental Variable and Regression Discontinuity

74

Korea Summer Workshop on Causal Inference 2022

How to Report IV Analyses

  • Step 1. Show statistical tests for the IV validity
  • Step 2. Justify theoretical validation of the IVs
  • Step 3. Conduct sensitivity tests for the IVs
  • Step 4. Clarify whether what you estimate is LATE or ATE.

Swanson, S.A. and Hernán, M.A., 2013. Commentary: how to report instrumental variable analyses (suggestions welcome). Epidemiology24(3), pp.370-374.

75 of 81

Instrumental Variable and Regression Discontinuity

75

Korea Summer Workshop on Causal Inference 2022

“If you can’t see it in the reduced form, it ain’t there.”

  • IV estimator is the estimator in the reduced form, divided by the estimator in the first stage.

 

 

 

Exclusion Restriction/Exogeneity Assumption

76 of 81

Instrumental Variable and Regression Discontinuity

76

Korea Summer Workshop on Causal Inference 2022

Statistical Tests are Necessary, but not Sufficient

  • The relevance assumption (not weak instrument) can be validated using some statistics (e.g., Stock-Yogo).
  • However, the exclusion restriction and exogeneity assumptions “are not empirically verifiable: they are assumptions” (Swanson and Hernán 2013, p. 371).
    • All statistical tests for the IVs assume the partial validity of IVs.
      1. Hausman test: Testing for the existence of endogeneity

      • Sargan-Hansen J (over-identifying) test: Testing for the exclusion restriction/exogeneity assumptions of IVs

Swanson, S.A. and Hernán, M.A., 2013. Commentary: how to report instrumental variable analyses (suggestions welcome). Epidemiology24(3), pp.370-374.

Null Hypothesis: There is no correlation between the error term (residual) using (K-1) instruments and the other instrument.

Null Hypothesis: There is no significant difference between OLS and IV estimators.

77 of 81

Instrumental Variable and Regression Discontinuity

77

Korea Summer Workshop on Causal Inference 2022

Theoretical Justification for the IVs is Critical

  • Evidence on theoretical justification for instrument validity

Bowen III, D.E., Frésard, L. and Taillard, J.P., 2016. What’s Your Identification Strategy? Innovation in Corporate Finance Research. Management Science, 63(8), pp.2529-2548.

Sovey, A.J. and Green, D.P., 2011. Instrumental Variables Estimation in Political Science: A Readers’ Guide. American Journal of Political Science, 55(1), pp.188-200.

(Sovey and Green 2011)

(Bowen et al. 2016)

78 of 81

Instrumental Variable and Regression Discontinuity

78

Korea Summer Workshop on Causal Inference 2022

Theoretical Justification for the IVs is Critical

  • Evidence on theoretical justification for instrument validity

(Swanson and Hernán 2013)

Another usefulness of causal graph for IV analysis

“Although these conditions are statistically similar, it is important to consider them separately in order to incorporate subject-matter knowledge in discussions about their validity and in decisions on adjustments.” (Swanson and Hernán 2013, p. 371)

Necessary, but not Sufficient, Condition

Swanson, S.A. and Hernán, M.A., 2013. Commentary: how to report instrumental variable analyses (suggestions welcome). Epidemiology24(3), pp.370-374.

79 of 81

Instrumental Variable and Regression Discontinuity

79

Korea Summer Workshop on Causal Inference 2022

Korea Summer Workshop on Causal Inference 2022

Wrap-Up

80 of 81

Instrumental Variable and Regression Discontinuity

80

Korea Summer Workshop on Causal Inference 2022

What’s Your Research Design and Data Structure?

Treatment and control groups are observed.

Parallel trend assumption is valid.

Treatment is assigned by arbitrary threshold.

There is an exogenous variable that can induce the treatment.

No

Local Average Treatment Effect

Yes

No

DID + Matching

Synthetic Control

Regression Discontinuity

Quasi-Experiments

The goal is causal inference, and random assignment is feasible.

Matching/

Weighting

There is a single treatment group and no information about functional forms.

Regression

Selection on Observables

No

Yes

No

Yes

No

Yes

Randomized Controlled Trial

No

Longitudinal data is observed.

Interrupted Time-Series Analysis

Yes

No

Longitudinal data is observed.

Not feasible

No

Yes

Yes

Difference-in-Differences (DID)

Yes

or

No

Yes

Control Function & Selection Model

or

Instrumental Variable

Quasi-experimental designs are available.

Note that the flowchart may depend on the context.

81 of 81

End of Document

Instrumental Variable and Regression Discontinuity

81

Korea Summer Workshop on Causal Inference 2022