1 of 30

Rethinking the way we think about percentages

SRCCON 2019

2 of 30

Amelia McNamara

University of St Thomas

Department of Computer and Information Sciences

PhD, statistics, UCLA

@AmeliaMN

Ryan Menezes

Los Angeles Times

Data journalist

B.S., statistics, UCLA

@ryanvmenezes

3 of 30

What kind of math do you apply to your work?

Let’s discuss

4 of 30

5 of 30

A home is crowded if it houses more than 1 person per room

6 of 30

7 of 30

Eight homes, all crowded

8 of 30

22 homes, all crowded

9 of 30

16,000 homes, 32% of them crowded

10 of 30

What could we do to ensure we find meaningful data points?

Let’s discuss

11 of 30

Finding more meaningful data points

  • Apply a “cutoff” and keep only ZIPs that had a certain number of homes
    • But what should the cutoff be? 5,000 homes? 10,000 homes?
    • This is an arbitrary decision, and could cut off meaningful data
  • Keep a subset of ZIPs from each state

12 of 30

Consider the denominator!

  • Clearly, the places with high percentages are not newsworthy data points
    • They’re in rural areas with few homes
  • ZIPs with a low number of total homes are more likely to have a higher rate of crowding, simply due to chance
  • To find more meaningful data points, we need to consider not only the percentage, but the number of observations on which it is based

13 of 30

14 of 30

15 of 30

Crowding rate in a ZIP

16 of 30

Crowding rate in a ZIP

National crowding rate (3%)

17 of 30

Crowding rate in a ZIP

National crowding rate (3%)

Number of homes in a ZIP (denominator of p-hat)

18 of 30

19 of 30

Most acute crowding: 90011 (22,000 homes, 42% crowded)

20 of 30

90006: 18,000 homes, 43% crowded

21 of 30

22 of 30

How do you feel about this approach?�Would it work for your audience?

Let’s discuss

23 of 30

Going further

24 of 30

The Most Dangerous Equation (Howard Wainer: Picturing the Uncertain World, 2009)

25 of 30

All Maps of Parameter Estimates are Wrong (Gelman and Price, 1999)

"Unfortunately, multiply imputed maps are not suitable for presenting final results (estimated

cancer rates, mean radon concentrations, etc.) to most audiences, who would likely just be confused by them. Furthermore, maps really do make convenient look-up tables (what is the cancer rate, or mean radon level, in my county?)."

26 of 30

Bayesian Surprise Maps

27 of 30

How spatial polygons shape our world - Amelia McNamara https://www.youtube.com/watch?v=wn5larsRHro

28 of 30

How do we make this simpler?

29 of 30

How to Improve Bayesian Reasoning Without Instruction:

Frequency Formats

(Gigerenzer and Hoffrage, 1995)

30 of 30

How do we explain/implement this?

Let’s discuss