INSTRUMENTAL
VARIABLES
The Solution to Endogeneity
A Complete Guide with Real Examples
The Problem We're Solving
Remember Endogeneity?
When X is correlated with the error term, your regression coefficients are BIASED. You get the wrong answer, and you can't trust your results.
The Challenge:
You can't always run a randomized experiment. Sometimes you're stuck with observational data where X is endogenous.
The Solution: Instrumental Variables
IV uses a 'helper' variable (Z) that affects X but doesn't directly affect Y. This lets you isolate the clean, unbiased relationship between X and Y.
What is an Instrumental Variable?
An Instrumental Variable (Z) is a 'helper' variable that:
• Affects your X variable (creates variation in X)
• But does NOT directly affect Y (only affects Y through X)
The Pathway:
Z
(Instrument)
Affects
X
(Treatment)
Affects
Y
(Outcome)
✗
NO direct effect!
💡 Key Insight: The instrument creates variation in X, but ONLY affects Y indirectly through X. This gives us clean, unbiased estimates.
Simple Example: Training & Productivity
Research Question: Does IT training increase employee productivity?
The Problem:
X = IT training hours
Y = Productivity score
Endogeneity:
• Managers select smart employees for training
• Smart employees are ALSO naturally more productive
• X is correlated with ability (in error term)
The IV Solution:
Z = Distance from home to training center
Why it works:
✓ Relevance: People living closer attend more training
✓ Exclusion: Distance doesn't affect productivity except through training
How It Works:
Distance to Center
Training Hours
Productivity
Two Requirements for Valid Instruments
For Z to be a good instrument, it MUST satisfy BOTH conditions:
Requirement #1: RELEVANCE
Z must actually affect X
Test: Run regression: X = a + bZ + error
If the coefficient on Z is:
• Statistically significant (p < 0.05)
• F-statistic > 10
→ You have a RELEVANT instrument ✓
Requirement #2: EXCLUSION RESTRICTION
Z must affect Y ONLY through X (no direct effect on Y)
The Hard Part: You CANNOT statistically test this!
You must rely on:
• Theoretical arguments
• Common sense and logic
• Domain knowledge
⚠ This is why finding good instruments is HARD!
Testing Relevance: Real Example
First Stage Regression:
Training Hours = 40 - 1.5(Distance to Center) + error
Results:
Variable
Coefficient
F-Statistic
Distance
-1.5
45.2
(Intercept)
40
—
✓ Interpretation:
• Coefficient: Each extra mile from training center reduces training by 1.5 hours
• F-statistic = 45.2 (>> 10): STRONG instrument! Distance strongly predicts training.
• Conclusion: Relevance requirement is SATISFIED ✓
Understanding Exclusion Restriction
The Critical Question: Does Z affect Y directly?
✓ What's ALLOWED (Indirect Effect):
Z
X
Y
Z → X → Y is OKAY!
Indirect effect through X
✗ What's NOT ALLOWED (Direct Effect):
Z
✗
Y
Z → Y is NOT OKAY!
Direct effect violates exclusion
🤔 Example: Does Distance Violate Exclusion?
Question: Could distance to training center affect productivity DIRECTLY?
Potential violations:
• Maybe people far from center also live in rural areas (different labor market)
• Maybe long commutes cause stress that reduces productivity
→ You must think carefully about whether these are plausible!
How IV Works: The Magic Explained
Remember: X contains TWO types of variation
✓ Good Variation (Clean)
Example:
• Training varies because of distance
• Distance is random w.r.t. ability
• This variation is CLEAN
→ Using this gives TRUE causal effect
✗ Bad Variation (Contaminated)
Changes in X that ARE correlated with error term
Example:
• Training varies because of ability
• Smart people get more training
• This variation is CONTAMINATED
→ Using this gives BIASED results
What Each Method Uses:
OLS (Normal Regression):
Uses ALL variation = Clean + Contaminated
→ Result: Biased coefficients ✗
IV (Instrumental Variables):
Uses ONLY clean variation (from instrument Z)
→ Result: Unbiased coefficients ✓
Changes in X that are NOT correlated with error term
Two-Stage Least Squares (2SLS)
This is how IV actually works in practice
STAGE 1: Predict X using Z
Run regression: X = α + βZ + ε
This extracts the 'clean' part of X that comes from Z
You get: X̂ (X-hat) = Predicted values of X based only on Z
Example: Traininĝ = 40 - 1.5(Distance)
STAGE 2: Use predicted X̂ to explain Y
Run regression: Y = a + b(X̂) + ε
This gives you the IV estimate of how X affects Y
The coefficient b is now UNBIASED because it uses only clean variation
Example: Productivity = 50 + 2(Traininĝ)
💡 The magic: You're using only the variation in X that came from Z, which is NOT correlated with the error term!
Numerical Example: Step-by-Step
Three Employees:
Alice
Distance: 2 miles
Training: 40 hrs
Productivity: 85
Ability: High
Bob
Distance: 20 miles
Training: 15 hrs
Productivity: 70
Ability: High
Carol
Distance: 10 miles
Training: 25 hrs
Productivity: 75
Ability: Average
❌ OLS Problem:
OLS sees: Alice (40hrs) = 85, Bob (15hrs) = 70, Carol (25hrs) = 75
OLS thinks: More training → More productivity
BUT it can't tell: Is this because training works OR because high-ability people (Alice & Bob) got different amounts?
✓ IV Solution:
Stage 1: Predict training from distance only:
Alice (2 mi) → Traininĝ = 37 hrs | Bob (20 mi) → Traininĝ = 10 hrs | Carol (10 mi) → Traininĝ = 25 hrs
Stage 2: Now use ONLY these predicted values (ignoring ability!)
Real IS Example: CRM & Sales
Research Question: Does CRM software increase sales revenue?
The Setup:
X = Has CRM software (yes/no)
Y = Annual sales revenue
Problem: High-performing sales teams get CRM first (reverse causality)
The Instrument:
Z = Promotional pricing period (random timing of CRM discounts)
✓ Relevance: Companies buy CRM when it's on sale
✓ Exclusion: Timing of promotional pricing doesn't affect sales except through CRM adoption
Results Comparison:
OLS (Biased):
CRM increases sales by $250,000
IV (Unbiased):
CRM increases sales by $120,000
→ OLS overestimated the effect by 2x! Why? Because good sales teams were getting CRM first.
How to Test Your Instrument
Test #1: First-Stage F-Test (Relevance)
After Stage 1, check the F-statistic
Rule of Thumb:
• F > 10 → Strong instrument ✓ (safe to proceed)
• F < 10 → Weak instrument ✗ (don't trust IV results!)
Test #2: Overidentification Test (Multiple Instruments)
If you have 2+ instruments, test if they give consistent results
Sargan/Hansen J Test:
• Null hypothesis: Instruments are valid
• p > 0.05 → Instruments pass ✓
• p < 0.05 → At least one instrument is invalid ✗
Test #3: Exclusion Restriction (The Hard One)
You must rely on:
• Theoretical arguments and logic
• Domain expertise
• Robustness checks with different instruments
⚠ Weak instruments give biased estimates - often worse than OLS!
⚠ YOU CANNOT STATISTICALLY TEST THIS!
Common IV Mistakes to Avoid
1
Using a Weak Instrument
Problem: F-statistic < 10 in first stage
Result: Biased estimates (often worse than OLS!)
Solution: Find a stronger instrument or stick with OLS
2
Violating Exclusion Restriction
Problem: Instrument directly affects Y
Result: IV estimates are still biased
Solution: Think hard about alternative pathways from Z to Y
3
Small Sample Size
Problem: IV needs more observations than OLS
Result: Very large standard errors, unreliable results
Solution: Get more data or use a different method
4
Misinterpreting Results
Problem: IV estimates Local Average Treatment Effect (LATE)
Result: Effect only for people influenced by instrument
Solution: Be clear about what population you're estimating
When to Use IV (Decision Guide)
✓ USE IV When:
(reverse causality, omitted variables)
(observational data only)
(strong relevance, plausible exclusion)
(IV needs more data than OLS)
✗ DON'T Use IV When:
(just do that instead!)
(F < 10)
(instrument might directly affect Y)
(no need to overcomplicate)
Quick Decision Tree:
1. Do you have endogeneity? NO → Use OLS | YES → Continue
2. Can you run an experiment? YES → Do that! | NO → Continue
3. Do you have a valid instrument? YES → Use IV | NO → Try other methods
4. Is F > 10? YES → Proceed | NO → Don't use IV
IV vs. Other Solutions
Choosing the right method for your endogeneity problem:
Method
What It Fixes
Requirements
When to Use
IV
All types
Valid instrument
Can't experiment, have good Z
Fixed Effects
Omitted variables (time-invariant)
Panel data
Same entities over time
Experiments
All types
Ability to randomize
Gold standard (when possible)
Diff-in-Diff
Omitted variables
Treatment/control + time
Natural experiment
Matching
Omitted variables
Rich observables
Selection on observables
💡 Key Insight: IV is powerful but demanding. Only use it when you have a genuinely good instrument. Otherwise, consider alternatives.
Strategies for Finding Instruments
Natural Experiments
• Policy changes (affect some groups, not others)
• Geographic variation (distance, climate, time zones)
• Random events (lotteries, natural disasters)
Timing Variation
• When different people/firms adopt technology
• Staggered rollouts of policies
• Birth quarter (compulsory schooling laws)
Decision-Maker Characteristics
• Manager background/experience
• Organizational factors (size, age, industry)
• Leadership changes
Historical Accidents
• Distance to historical infrastructure
• Legacy of past policies
• Random assignment in past programs
Key Takeaways
1. IV solves endogeneity by using clean variation from instrument Z
2. Two requirements: Relevance (testable) + Exclusion (not testable)
3. 2SLS: Predict X from Z, then use X̂ to predict Y
4. Test relevance with F-statistic (must be > 10)
5. Good instruments are RARE - don't force it
6. IV estimates Local Average Treatment Effect (LATE)
Remember
IV is like using a clean water source
It filters out the contaminated variation
and gives you the pure, unbiased effect
Master IV, and you'll unlock causal inference even when experiments are impossible