1 of 52

Econometrics: A Brief Overview

2 of 52

Agenda

  • Background
  • Core Methods
  • Machine Learning and Causal Inference

3 of 52

Running Example

For today, we are all labor economists interested in education policy

Our goal is to estimate the causal relationship between class size and student achievement*

To illustrate different methods, we will play with some of the details of the hypothetical experiment, but the following will stay the same

  • We will be working with simulated data
  • Students are the main unit of observation
  • Test scores are the primary outcome measure

* This is a common framing based on the Tennessee STAR experiment

4 of 52

Part 1: Background

5 of 52

Distributions

Data Generating Process

6 of 52

Populations and Samples

We would like to know the average test score in the population...

But… It is too expensive to track all students, so we take a sample

How should we estimate the population mean from the sample?

  • The the ith value from the sample
  • Mode
  • Mean

What makes one estimator better than another?

  • Consistent
  • Unbiased
  • Efficient

The Sample mean is...

  • A random variable
  • The Best Linear Unbiased Estimator of the population mean

7 of 52

Law of Large Numbers

As the sample size increases, the sample mean will be close to the population mean

In other words, the sample mean is a consistent estimator

8 of 52

NHST

Null Hypothesis -- The hypothesis to be tested, e.g. the average test score in the population is 75

Test Statistic -- A statistic (any function of the sample) used to assess a hypothesis.

p-value -- The probability of achieving at least as extreme a value as the computed test statistic assuming the null hypothesis is correct

Power -- The probability that the test correctly rejects the null hypothesis when the alternative is true

9 of 52

NHST (continued)

10 of 52

References

11 of 52

Causal Inference

Correlation does not equal causation… so how can we ever make a causal claim?

What if we could observe the same student in both a small class and a large class?

Potential Outcomes model of causality (Rubin causal model, 1974)

  • The treatment effect is the difference in test scores between the two scenarios

Obviously we cannot observe an entity in both conditions...

  • Causal inference in fundamentally a missing data problem
  • Instead, we consider the average treatment effect across treated and untreated groups

But what if the two groups are different?

  • This is selection bias and it is the fundamental goal of econometrics to overcome it
  • Randomization solves the selection problem
  • Other methods can be used to make causal claims with quasi-experimental and observational data

Identification Strategy -- The combination of subject matter expertise, data generating process, and statistical methods used to justify a causal claim

12 of 52

What makes a study invalid?

Avoiding these pitfalls is key to having a valid identification strategy

Internal Validity -- Anything that can lead to biased estimates or invalid inferences is a violation

  • Omitted variables
  • Misspecified functional form
  • Measurement Error
  • Selection bias
  • Simultaneous causality
  • Incorrect standard errors

External Validity

  • Non-representative sample
  • Non-representative program or policy

13 of 52

What makes an experimental study invalid?

We said that randomization solves the selection problem, but....

  • Failure to randomize
  • Failure to follow treatment protocol
  • Attrition
  • Experimental Effects
  • Small sample sizes

14 of 52

References

15 of 52

Part 2: Core Methods

16 of 52

Regression

Assumptions

  • Predictions are good, on average
  • Treatment and outcome variables are independent and identically distributed
  • Large outliers are unlikely

Given these strong assumptions why OLS?

  • BLUE
  • MVUE

OLS provides Consistent, Unbiased, Precise, and Efficient estimates relative to other estimators

17 of 52

T-test to OLS

18 of 52

What about parental support?

19 of 52

Consistency and Bias

20 of 52

Omitted Variable Bias

support

income

support

1.0

0.8

income

0.8

1.0

Correlation Matrix

Data Generating Process

21 of 52

Omitted Variable Bias (cont)

22 of 52

Part I Recap

  • Sprinted through STAT-100
  • Defined causal inference through the Potential Outcomes framework
    • Goal of any method is to solve the selection bias problem
    • We desire unbiased, consistent, and low variance estimates for the causal effect under study
  • Talked about what makes a data analysis invalid
  • Explored single and multiple Regression
    • Omitting important variables can lead to biased estimates

23 of 52

More detail on our approach...

Problem/Question

What is the problem you are trying to solve? What is the causal relationship that you want to understand?

Data/Method

What data would allow you to answer the question? How was the data collected? Observational? Experimental? What methods can be used? What is our model for the process?

Mother Nature

What process is responsible for producing observations in the data set? Mother nature as a data factory stamping out observations? What levers exist?

24 of 52

Why simulation?

Causal inference requires the analyst to have an explicit model of how the world works

The assumed model can (and will) differ from the true model

By controlling the data generating process, simulation allows us to:

  • Explore the effects of divergence between the assumed model and the true model
  • Better understand our methods
  • Develop an intuition for what could be driving odd results in real data

25 of 52

Instrumental Variables

IV is used when our assumed model is wrong in some systematic way:

  • Omitted variable
  • Simultaneous causality
  • Measurement Error
  • Selection bias (non-random treatment assignment)

We introduce an instrument into the model that satisfies two criteria:

  • Exogeneity -- The instrument affects the outcome only through the endogenous variable
  • Relevance -- The instrument is highly correlated with the endogenous variable

26 of 52

The “Classic” Example

Fulton Fish Market

  • How does changing price affect the demand for fish?
    • We observe quantity sold, which depends on both the supply of fish and the demand for fish
  • To understand the demand side, we look for determinants of supply that do not affect demand
    • Stormy days make it harder to catch fish, which reduces supply the next day
  • Therefore, use stormy weather as an instrument for what happens when there are exogenous price shocks

Can we make the case that stormy weather is a valid instrument?

27 of 52

Estimation: Two-Stage Least Squares

  • Regress the endogenous (usually treatment) variable on the instrument

  • Calculate the in-sample predicted values for treatment

  • Regress the outcome variable on the predicted values from step (2)

28 of 52

Why endogeneity matters

smallClass

olo

smallClass

1.0

0.6

olo

0.6

1.0

Correlation Matrix

Data Generating Process

Model

29 of 52

Exogenous, but Irrelevant

smallClass

olo

smallClass

1.0

0.6

olo

0.6

1.0

Correlation Matrix

Data Generating Process

First Stage

Second Stage

Instrument -- Parent drives a white car

30 of 52

Relevant, but Endogenous

smallClass

olo

enroll

smallClass

1.0

0.6

0.4

olo

0.6

1.0

0.6

enroll

0.4

0.6

1.0

Correlation Matrix

Data Generating Process

First Stage

Second Stage

31 of 52

Valid, but Weak Instrument

olo

treatment

instrument

olo

1.0

0.6

0.1

treatment

0.6

1.0

0.0

instrument

0.1

0.0

1.0

Correlation Matrix

Data Generating Process

First Stage

Second Stage

32 of 52

Valid Instrument

olo

treatment

instrument

olo

1.0

0.6

0.6

treatment

0.6

1.0

0.0

instrument

0.6

0.0

1.0

Correlation Matrix

Data Generating Process

First Stage

Second Stage

Instrument -- Percent change in enrollment

33 of 52

Regression Discontinuity

Used when treatment depends on crossing some threshold

Often used with observational data

Two Types

  • Sharp -- Crossing the threshold guarantees treatment
  • Fuzzy -- Crossing the threshold increase the probability of treatment

34 of 52

Sharp RDD

Add an indicator for the threshold that determines treatment and regress

Data Generating Process

Model

35 of 52

Fuzzy RDD

Use the threshold as an instrument for treatment and estimate with two stage least squares

big

split

big

1.0

0.8

split

0.8

1.0

Correlation Matrix

Data Generating Process

First Stage

Second Stage

36 of 52

Difference in Difference

Used when there are differences between treatment groups unrelated to the treatment

  • Baseline data required
  • ATE with only time 2 data is P2 - S2
  • Diff-in-diff estimate is P2 - [P1 - S1]
  • Requires the “parallel trends” assumption

37 of 52

Diff-in-Diff Estimation

Data Generating Process

Method 1

Method 2

38 of 52

References for Core Methods

39 of 52

Matching

Another approach to solving the selection bias problem

Definition

Any method that aims to balance the distribution of covariates between two (or more) groups

Objective -- Approximate a RCT with observational data

Brief History

  • Initial work in the 1940’s
  • Theoretical work began in the 1970s
  • Canonical work on propensity scores in 1983

Advantages

  • Can be used to complement other methods (OLS, IV, Diff-in-Diff)
  • Explicitly highlights insufficient overlap between groups
  • Straightforward diagnostics

Assumptions

  • SUTVA
  • Unconfoundedness

40 of 52

4 Key Steps

  • Define a measure of closeness
    • Determine what covariates to include -- Goal is to satisfy unconfoundedness assumption
    • Select a distance measure
      • Exact matching
      • Mahalanobis -- distance from the distribution
      • Propensity Score -- models the probability of treatment
      • Linear Propensity Score
      • Prognosis Score -- models the outcome of each individual under the control condition
  • Implement a matching method that uses (1)
    • Nearest Neighbor
    • Subclassification
    • Full Matching
    • Weighting
  • Assess the quality of the matched sample
    • Standardized differences
    • Variance ratios
    • QQ plots, histograms, box plots, and plots of standardized differences
  • Analyze the outcome and estimate the treatment effect using
    • Nearest Neighbor -- Proceed as if matched sample is the result of SRS. Estimate ATE via a model
    • Subclassification, Full Matching, and Weighting -- Estimate effects within each subclass

41 of 52

Example

Goal -- Use observational data to assess the impact of class sizes on test scores

  • Select age, sex, parental income, and teacher performance ratings as matching criteria
  • Calculate propensity scores:
  • Match each student in a small class to their closest match in a large class
  • Compute the standardized difference in means along each covariate
  • Estimate the ATE using the matched sample:

42 of 52

References for Matching

43 of 52

Part 3: ML and Causal Inference

Late 00’s to present has seen a smattering of methods aimed at applying machine learning methods to causal inference

Key Developments*

  • Bayesian Additive Regression Trees
  • Post-selection Inference
  • SuperLearner
  • Interpretable Modelling
  • Causal Trees and Forests
  • G-Estimation
  • Double ML

* The following list comes from slides created by Skipper Seabold

44 of 52

ML and Matching

Traditionally, propensity scores have been estimated using a logit or probit model

Why not use some other SL/ML method that can output probabilities?

  • CART
  • Random Forest
  • Gradient Boosting Machines
  • SVM

45 of 52

Double ML -- Fishing Bans and Coral Health

Source Code

�Goal -- Estimate the effect of a fishing ban on coral reef health

Variables -- treatment, fish biomass , coral health variables (size, height, % sand, % hard coral)

Fishing ban non-randomly assigned

Intervention Objective -- Increase fish population in the short-run, improve coral health in the long run�

46 of 52

Procedure

  • Split the available data into two disjoint sets
  • Use the first split to estimate the relationship between fishing ban and 5 predictors

  • Compute the residualized propensity scores on the second split

  • Use the first split to estimate the relationship between biomass and 4 predictors

  • Compute residuals on the second split

  • Reverse the roles of the first and second splits. Repeat steps 1-5
  • Stack the residuals from step 3 and separately from step 5. Estimate the causal effect

47 of 52

References for ML and Causal Inference

48 of 52

Future Topics

  • Fixed and Random effects
  • Multilevel/Hierarchical models
  • Synthetic Controls
  • Time series methods
  • Explore Other ML for causal inference methods

49 of 52

Extra

50 of 52

Ics and Ings

Statistics, econometrics, statistical learning, and machine learning… what is the difference?

  • All are built on top of probability theory and linear algebra
  • Statistics
    • High purity (emphasis on validity)
    • Objective is usually valid inference
    • Models or algorithms
  • Econometrics
    • Medium purity
    • Heavily focused on causality. Willing to stomach some strong assumptions to get it
    • Model-based
  • Statistical Learning
    • Medium purity
    • Heavily focused on explanation and prediction
    • Algorithms
  • Machine learning
    • Pure practicality
    • Focused on prediction to the exclusion (sometimes) of explanation
    • Algorithms

51 of 52

Models and Algorithms

A model is a statement about the data generating process, i.e. how the world works

An algorithm is a way to compute something

When doing econometrics, the objective is to study causal relationships, so we are in the land of models

52 of 52

Types of Data

  • Experimental
  • Quasi-Experimental
  • Observational

We will be working with simulated data of each type over the next hour