2 of 18

The Problem We're Solving

Remember Endogeneity?

When X is correlated with the error term, your regression coefficients are BIASED. You get the wrong answer, and you can't trust your results.

The Challenge:

You can't always run a randomized experiment. Sometimes you're stuck with observational data where X is endogenous.

The Solution: Instrumental Variables

IV uses a 'helper' variable (Z) that affects X but doesn't directly affect Y. This lets you isolate the clean, unbiased relationship between X and Y.

3 of 18

What is an Instrumental Variable?

An Instrumental Variable (Z) is a 'helper' variable that:

• Affects your X variable (creates variation in X)

• But does NOT directly affect Y (only affects Y through X)

The Pathway:

(Instrument)

Affects

(Treatment)

Affects

(Outcome)

✗

NO direct effect!

💡 Key Insight: The instrument creates variation in X, but ONLY affects Y indirectly through X. This gives us clean, unbiased estimates.

4 of 18

Simple Example: Training & Productivity

Research Question: Does IT training increase employee productivity?

The Problem:

X = IT training hours

Y = Productivity score

Endogeneity:

• Managers select smart employees for training

• Smart employees are ALSO naturally more productive

• X is correlated with ability (in error term)

The IV Solution:

Z = Distance from home to training center

Why it works:

✓ Relevance: People living closer attend more training

✓ Exclusion: Distance doesn't affect productivity except through training

How It Works:

Distance to Center

Training Hours

Productivity

5 of 18

Two Requirements for Valid Instruments

For Z to be a good instrument, it MUST satisfy BOTH conditions:

Requirement #1: RELEVANCE

Z must actually affect X

Test: Run regression: X = a + bZ + error

If the coefficient on Z is:

• Statistically significant (p < 0.05)

• F-statistic > 10

→ You have a RELEVANT instrument ✓

Requirement #2: EXCLUSION RESTRICTION

Z must affect Y ONLY through X (no direct effect on Y)

The Hard Part: You CANNOT statistically test this!

You must rely on:

• Theoretical arguments

• Common sense and logic

• Domain knowledge

⚠ This is why finding good instruments is HARD!

6 of 18

Testing Relevance: Real Example

First Stage Regression:

Training Hours = 40 - 1.5(Distance to Center) + error

Results:

Variable

Coefficient

F-Statistic

Distance

-1.5

45.2

(Intercept)

—

✓ Interpretation:

• Coefficient: Each extra mile from training center reduces training by 1.5 hours

• F-statistic = 45.2 (>> 10): STRONG instrument! Distance strongly predicts training.

• Conclusion: Relevance requirement is SATISFIED ✓

7 of 18

Understanding Exclusion Restriction

The Critical Question: Does Z affect Y directly?

✓ What's ALLOWED (Indirect Effect):

Z → X → Y is OKAY!

Indirect effect through X

✗ What's NOT ALLOWED (Direct Effect):

✗

Z → Y is NOT OKAY!

Direct effect violates exclusion

🤔 Example: Does Distance Violate Exclusion?

Question: Could distance to training center affect productivity DIRECTLY?

Potential violations:

• Maybe people far from center also live in rural areas (different labor market)

• Maybe long commutes cause stress that reduces productivity

→ You must think carefully about whether these are plausible!

8 of 18

How IV Works: The Magic Explained

Remember: X contains TWO types of variation

✓ Good Variation (Clean)

Example:

• Training varies because of distance

• Distance is random w.r.t. ability

• This variation is CLEAN

→ Using this gives TRUE causal effect

✗ Bad Variation (Contaminated)

Changes in X that ARE correlated with error term

Example:

• Training varies because of ability

• Smart people get more training

• This variation is CONTAMINATED

→ Using this gives BIASED results

What Each Method Uses:

OLS (Normal Regression):

Uses ALL variation = Clean + Contaminated

→ Result: Biased coefficients ✗

IV (Instrumental Variables):

Uses ONLY clean variation (from instrument Z)

→ Result: Unbiased coefficients ✓

Changes in X that are NOT correlated with error term

9 of 18

Two-Stage Least Squares (2SLS)

This is how IV actually works in practice

STAGE 1: Predict X using Z

Run regression: X = α + βZ + ε

This extracts the 'clean' part of X that comes from Z

You get: X̂ (X-hat) = Predicted values of X based only on Z

Example: Traininĝ = 40 - 1.5(Distance)

STAGE 2: Use predicted X̂ to explain Y

Run regression: Y = a + b(X̂) + ε

This gives you the IV estimate of how X affects Y

The coefficient b is now UNBIASED because it uses only clean variation

Example: Productivity = 50 + 2(Traininĝ)

💡 The magic: You're using only the variation in X that came from Z, which is NOT correlated with the error term!

10 of 18

Numerical Example: Step-by-Step

Three Employees:

Alice

Distance: 2 miles

Training: 40 hrs

Productivity: 85

Ability: High

Bob

Distance: 20 miles

Training: 15 hrs

Productivity: 70

Ability: High

Carol

Distance: 10 miles

Training: 25 hrs

Productivity: 75

Ability: Average

❌ OLS Problem:

OLS sees: Alice (40hrs) = 85, Bob (15hrs) = 70, Carol (25hrs) = 75

OLS thinks: More training → More productivity

BUT it can't tell: Is this because training works OR because high-ability people (Alice & Bob) got different amounts?

✓ IV Solution:

Stage 1: Predict training from distance only:

Alice (2 mi) → Traininĝ = 37 hrs | Bob (20 mi) → Traininĝ = 10 hrs | Carol (10 mi) → Traininĝ = 25 hrs

Stage 2: Now use ONLY these predicted values (ignoring ability!)

11 of 18

Real IS Example: CRM & Sales

Research Question: Does CRM software increase sales revenue?

The Setup:

X = Has CRM software (yes/no)

Y = Annual sales revenue

Problem: High-performing sales teams get CRM first (reverse causality)

The Instrument:

Z = Promotional pricing period (random timing of CRM discounts)

✓ Relevance: Companies buy CRM when it's on sale

✓ Exclusion: Timing of promotional pricing doesn't affect sales except through CRM adoption

Results Comparison:

OLS (Biased):

CRM increases sales by $250,000

IV (Unbiased):

CRM increases sales by $120,000

→ OLS overestimated the effect by 2x! Why? Because good sales teams were getting CRM first.

12 of 18

How to Test Your Instrument

Test #1: First-Stage F-Test (Relevance)

After Stage 1, check the F-statistic

Rule of Thumb:

• F > 10 → Strong instrument ✓ (safe to proceed)

• F < 10 → Weak instrument ✗ (don't trust IV results!)

Test #2: Overidentification Test (Multiple Instruments)

If you have 2+ instruments, test if they give consistent results

Sargan/Hansen J Test:

• Null hypothesis: Instruments are valid

• p > 0.05 → Instruments pass ✓

• p < 0.05 → At least one instrument is invalid ✗

Test #3: Exclusion Restriction (The Hard One)

You must rely on:

• Theoretical arguments and logic

• Domain expertise

• Robustness checks with different instruments

⚠ Weak instruments give biased estimates - often worse than OLS!

⚠ YOU CANNOT STATISTICALLY TEST THIS!

13 of 18

Common IV Mistakes to Avoid

Using a Weak Instrument

Problem: F-statistic < 10 in first stage

Result: Biased estimates (often worse than OLS!)

Solution: Find a stronger instrument or stick with OLS

Violating Exclusion Restriction

Problem: Instrument directly affects Y

Result: IV estimates are still biased

Solution: Think hard about alternative pathways from Z to Y

Small Sample Size

Problem: IV needs more observations than OLS

Result: Very large standard errors, unreliable results

Solution: Get more data or use a different method

Misinterpreting Results

Problem: IV estimates Local Average Treatment Effect (LATE)

Result: Effect only for people influenced by instrument

Solution: Be clear about what population you're estimating

14 of 18

When to Use IV (Decision Guide)

✓ USE IV When:

✓ You have endogeneity

(reverse causality, omitted variables)

✓ You can't run an experiment

(observational data only)

✓ You have a valid instrument

(strong relevance, plausible exclusion)

✓ Sample size is large enough

(IV needs more data than OLS)

✗ DON'T Use IV When:

✗ You can run a randomized experiment

(just do that instead!)

✗ Your instrument is weak

(F < 10)

✗ You can't justify exclusion

(instrument might directly affect Y)

✗ Simple controls would work

(no need to overcomplicate)

Quick Decision Tree:

1. Do you have endogeneity? NO → Use OLS | YES → Continue

2. Can you run an experiment? YES → Do that! | NO → Continue

3. Do you have a valid instrument? YES → Use IV | NO → Try other methods

4. Is F > 10? YES → Proceed | NO → Don't use IV

15 of 18

IV vs. Other Solutions

Choosing the right method for your endogeneity problem:

Method

What It Fixes

Requirements

When to Use

All types

Valid instrument

Can't experiment, have good Z

Fixed Effects

Omitted variables (time-invariant)

Panel data

Same entities over time

Experiments

All types

Ability to randomize

Gold standard (when possible)

Diff-in-Diff

Omitted variables

Treatment/control + time

Natural experiment

Matching

Omitted variables

Rich observables

Selection on observables

💡 Key Insight: IV is powerful but demanding. Only use it when you have a genuinely good instrument. Otherwise, consider alternatives.

16 of 18

Strategies for Finding Instruments

Natural Experiments

• Policy changes (affect some groups, not others)

• Geographic variation (distance, climate, time zones)

• Random events (lotteries, natural disasters)

Timing Variation

• When different people/firms adopt technology

• Staggered rollouts of policies

• Birth quarter (compulsory schooling laws)

Decision-Maker Characteristics

• Manager background/experience

• Organizational factors (size, age, industry)

• Leadership changes

Historical Accidents

• Distance to historical infrastructure

• Legacy of past policies

• Random assignment in past programs

17 of 18

Key Takeaways

1. IV solves endogeneity by using clean variation from instrument Z

2. Two requirements: Relevance (testable) + Exclusion (not testable)

3. 2SLS: Predict X from Z, then use X̂ to predict Y

4. Test relevance with F-statistic (must be > 10)

5. Good instruments are RARE - don't force it

6. IV estimates Local Average Treatment Effect (LATE)

18 of 18

Remember

IV is like using a clean water source

It filters out the contaminated variation

and gives you the pure, unbiased effect

Master IV, and you'll unlock causal inference even when experiments are impossible