1 of 15

Lecture 20: SSF2: Avoiding Statistical Traps

Stat 165, Spring 2024

Jacob Steinhardt (notes by Yan Zhang)

2 of 15

Exercise

Memorize the following words (in order across rows):

credit restless philosophy corruption�

atmosphere harmony budge mutual�

pattern palm brand agreement�

center absent establish deep

3 of 15

Answers

  • credit
  • restless
  • philosophy
  • corruption
  • atmosphere
  • harmony
  • budge
  • mutual
  • pattern
  • palm
  • brand
  • agreement
  • center
  • absent
  • establish
  • deep

4 of 15

Exercise: Second Try

Memorize the following words (in order across rows):

decoration crosswalk far attraction�

practical ring defend economic�

folklore theme lease dip�

house burn cultural bed

5 of 15

Answers

  • decoration
  • crosswalk
  • far
  • attraction
  • practical
  • ring
  • defend
  • economic�
  • folklore
  • theme
  • lease
  • dip
  • house
  • burn
  • cultural
  • bed

6 of 15

Regression to the Mean

  • The “best performers” in one round generally do worse in the next round
  • Simple model:
    • Performance = Xskill + Xluck (sum of two random variables)
    • Top performers have both skill and luck – more likely to have less luck next time�
  • Other instances:
    • Tall women/men have taller daughters/sons
    • CEOs who receive a high-profile award often �“underperform expectations” in subsequent quarters

7 of 15

Regression to the Mean - Other Examples

  • What are some example forecasts where regression to the mean applies?�
  • My answers:
    • NBA PPG leader
    • Indian Well’s winner
  • Other possible ones (depending on how Jean selected questions):
    • Acres burnt by wildfires
    • Price of ETH / XRP / Euro
    • “You” more than 10 times

8 of 15

Brainstorming Exercise

Where does regression to the mean show up in forecasting or in life?��How would you account for it?

9 of 15

How to Account for Regression to the Mean

  • Better data:
    • Use long-run averages
    • Consider all candidates, not just previous top scorers (“Other” option)�
  • Better statistics:
    • Adjust using Bayesian prior (on board)

10 of 15

Selection of the Range

  • Suppose there is a certain city with five soccer leagues of different skill levels
  • Assume there was a Soccer Assessment Test that measured the speed, coordination, strength, and soccer experience of players in that city
  • Questions:
    • Within a league, how much will the Soccer �Assessment correlate with a player’s �success (measured by e.g. goals scored)?�
    • Suppose the players were shuffled �randomly among the leagues. �How much would the Soccer Assessment �correlate with player success?

11 of 15

Selection of the Range - Actual SAT Data

12 of 15

Thought Experiment

Which of the following do you expect to be positive/negatively/uncorrelated?

  • Music achievement and math achievement (link to answer)
  • IQ and sports performance (link to answer)
  • EQ and sports performance
  • IQ and EQ

Disclaimer: I have not read these studies in detail, and in general we should not draw conclusions from any single study.

13 of 15

Thought Experiment Answers

  • It seems that music+math positively correlate, as do sports+IQ and sports+EQ
    • I couldn’t figure out the answer for IQ+EQ (too much agenda on all sides)�
  • This probably doesn’t match your experience (from e.g. looking at other UC Berkeley students)�
  • Why is this?

14 of 15

L-shaped Selection

  • Derivation on board

15 of 15