1 of 16

Lecture 15: �Independence Assumptions

Stat 165

Jacob Steinhardt

2 of 16

Warm-up Exercise

  • What is the probability that I (Jacob) eat cereal� for breakfast tomorrow?

25%, 55%, 5%, 40%, 20%�

  • What is the probability that I eat cereal at least �once in the next week?��70%, 80%, 10%, 55%, 80%

3 of 16

Other Non-Independence Examples

  • Covid lockdowns per state
  • Rain in next week vs. on Wednesday
  • Celtics losing next game to Nets vs. next 3 games��
  • Brainstorming: other examples?
    • Flights leaving airport in a week -> bad weather could make all days have few flights
    • -> border closure
    • Performance of overall stock market (and correlation across different stock markets)

4 of 16

Two Consequential Wrong Predictions

  • 2008 financial crisis
  • 2016 US presidential election������
  • Both were (partly) failures to account for non-independence!

5 of 16

2016 Election – A Simple Model

  • Each state s has Ns polls conducted in that state
  • Poll i in state s has sample size ni,s and ki,s respondents who will vote for Clinton
  • Aggregate polls together – total margin of error is approximately 1/sqrt(total sample size)
  • Assume each state’s vote share has Gaussian error around the polling results
  • Simulate draws of all 50 states, look at how often Clinton wins across many different draws
  • This gives >99% probability to Clinton winning

6 of 16

2016 Election – A More Complicated Model

  • Each pollster has some bias – tends to consistently favor either Republicans or Democrats
  • Bias not fully known, but can be estimated from past elections
  • Modify the Gaussian for each state:
    • Change the mean based on estimate of polling bias
    • Increase the variance based on unknown component of polling bias
    • To be safe, replace Gaussian with Student-t�
  • This still (often) gives >99% probability of Clinton winning

7 of 16

2016 Election – What’s the problem?

  • Polls have systematic error (as already discussed)
  • Some error affects all the polls (nonresponse bias)
  • So, at the very least, should have additional global bias term that affects all polls (and hence all states) at once
  • There was a systematic polling error of �~2% (plus states Clinton lost were� particularly bad in electoral college).�
  • 2% was within range of �historical errors

8 of 16

2016 Election – Models That Work

  • Correlated national error accounts for �election uncertainty�
  • Fancier version: state-level correlations�(example)�
  • Some models expressed uncertainty �with heavy tails, rather than correlations, �but this leads to weird effects: see this �article by Andrew Gelman�
  • What other factors (beyond polls) �would you put into your predictive model?

9 of 16

2008 Financial Crisis

  • Basic background: many people were offered “subprime” (i.e. risky) mortgages on their houses. Or in other words, they were offered loans with an (initially) low interest rate, despite not having strong finances.
    • Rates increased over time, but housing values were also increasing.
  • The loans were partly financed through retirement accounts. �
  • But retirement accounts are �supposed to be low risk—so �how was this possible?

10 of 16

Collateralized Debt Obligations (CDOs)

  • CDOs are a way to turn risky financial instruments (bets) into a less risky bet�
  • Simplest way to reduce risk: take �N bets (mortgages) and average
  • But can do better, with tranches, �ranked from senior to junior�
  • If a mortgage defaults, most �junior tranches take losses first�
  • Senior tranches should be very �unlikely to ever take heavy losses

11 of 16

CDOs illustrated (from Wikipedia)

12 of 16

Risk and Leverage

  • Senior tranches were “AAA” rated, meaning considered very low risk (low enough for retirement accounts)�
  • They also had pretty good returns.
  • Low risk + high returns => want to be a large share of your portfolio�
  • So companies “leveraged”: took out loans in order to buy more CDOs�
  • This blew up in everyone’s faces. Why?

13 of 16

Non-Independence of Housing Market

  • AAA rating + risk assessments assumed mortgage defaults were independent, or at least not too correlated�
  • But if national housing prices dropped, many people would default at once. Even senior tranches might not pay out.�
  • This happened, and highly leveraged investment banks collapsed (+many other bad things).

14 of 16

Non-Independence of Housing Market (contd.)

  • Actual story is a bit more complicated: correlations were modeled, �using a Gaussian copula
  • Basic idea: sample debt payouts from a multivariate Gaussian (with specified correlation structure)
  • Issue: payouts don’t follow normal distribution. So apply 1-dimensional transformation to each payout to give it desired distribution shape.
  • Big problem: despite correlation in bulk, no tail dependence (extremes in one variable don’t imply extremes in other)
    • Colloquially: conditional on X1 being in the 99.9th percentile, the probability that X2 is also in 99.9th percentile is close to zero in the model, but not in reality.

15 of 16

Lessons for Forecasting

  • Ignoring independence leads to underestimating tail events
  • This is because tail events require many things to “go wrong” at once
  • Correlated errors mean this happens more often than expected
  • Models with bulk correlations might not have much correlation in tails!�
  • How can we handle this in our own forecasts?

16 of 16

Strategies for Handling Correlation

  • Add latent variables to simulations
  • Ask “what could change all of these at once?”
  • For “fancy” models, check/think about tail correlations
  • Look at historical reference classes