1 of 32

Going deeper with Census microdata: IPUMS is your friend

March 6, 2025

Sandhya Kambhampati

Paul Overberg

David Van Riper

MaryJo Webster

2 of 32

ipums.org

3 of 32

Observation units of census & survey data

Geographic summary data: Areas

Microdata: Respondents

Households

Persons

+ Flexible classification, modeling & �cross-tabulation

- Restricted geographic detail

- Limited sample sizes

4 of 32

Summary Data���

Age

Female

Male

Total

Under 5 years

9,598,316

10,051,876

19,650,192

5 to 9 years

9,787,932

10,191,107

19,979,039

10 to 14 years

10,310,162

10,797,748

21,107,910

15 to 17 years

6,143,574

6,416,023

12,559,597

18 and 19 years

4,214,584

4,400,774

8,615,358

20 years

2,177,131

2,310,954

4,488,085

21 years

2,155,180

2,270,025

4,425,205

22 to 24 years

6,307,618

6,599,470

12,907,088

25 to 29 years

11,411,800

11,850,355

23,262,155

30 to 34 years

10,998,357

11,224,653

22,223,010

35 to 39 years

10,687,259

10,658,796

21,346,055

40 to 44 years

10,034,534

9,966,088

20,000,622

45 to 49 years

10,395,144

10,174,825

20,569,969

50 to 54 years

10,650,734

10,320,033

20,970,767

55 to 59 years

11,202,165

10,583,556

21,785,721

60 and 61 years

4,427,289

4,115,170

8,542,459

62 to 64 years

6,150,445

5,622,814

11,773,259

65 and 66 years

3,850,294

3,467,860

7,318,154

67 to 69 years

5,303,633

4,679,985

9,983,618

70 to 74 years

7,130,592

6,115,586

13,246,178

75 to 79 years

5,030,479

4,083,235

9,113,714

80 to 84 years

3,520,631

2,558,706

6,079,337

85 years and over

4,262,925

2,358,891

6,621,816

Total

165,750,778

160,818,530

326,569,308

Age by Sex, 2016-2020 5-Year ACS Summary File (via IPUMS NHGIS)

5 of 32

https://www.sfchronicle.com/projects/2022/san-francisco-asian-population/

Geographic summary data: Areas

6 of 32

Microdata: Respondents

Households

Persons

+ Flexible classification, modeling & �cross-tabulation

- Restricted geographic detail

- Limited sample sizes

7 of 32

Benefits of Using IPUMS

  • Harmonization
  • Customized data extracts
  • Integrated and supplemental documentation
  • Online data analysis tools
  • Additional data enhancements
  • User support

8 of 32

What is Harmonization?

  • Creating a single, consistent data series from datasets collected at different times
  • Applying a coding scheme to group broadly comparable categories but retain additional detail when available
  • Documenting these decisions and flagging potential comparability issues

9 of 32

Changes to detail in legal marital status in NHIS

B = Not in Universe

1 = Separated

2 = Divorced

3 = Married

4 = Single/Never Married

5 = Widowed

2004-forward Categories

B = Not in Universe

1 = Married – spouse present

2 = Married – spouse not in household

3 = Married – spouse in household unknown

4 = Widowed

5 = Divorced

6 = Separated

7 = Never Married

Pre-2004 Categories

10 of 32

Pre-2004 codes

2004-forward codes

Harmonized codes

B = Not in Universe

B = Not in Universe

00 = Not in Universe

3 = Married

10 = Married

1 = Married – spouse present

11 = Married – spouse present

2 = Married – spouse not in household

12 = Married – spouse not in household

3 = Married – spouse in household unknown

13 = Married – spouse in household unknown

4 = Widowed

5 = Widowed

20 = Widowed

5 = Divorced

2 = Divorced

30 = Divorced

6 = Separated

1 = Separated

40 = Separated

7 = Never Married

4 = Never Married

50 = Never Married

    • The first digit of the code is consistent across all years�
    • The second digit of the code contains detail specific to pre-2004 data

11 of 32

Geography in PUMS

  • Regions, divisions, states & …
  • Public Use Microdata Areas (PUMAs):
    • At least 100,000 residents
    • “County groups” in 1970 & 1980
    • IPUMS has created 1960 PUMAs & “mini-PUMAs” �(> 50,000 residents)

12 of 32

IPUMS USA geographic resources

  • Supplementary variables, based on PUMAs
    • Counties, cities, metro areas, metro status, % metro

13 of 32

CITY “mismatch tolerance”

Minneapolis

1.3%�commission �error

Saint Paul

0% mismatch

14 of 32

CITY Comparability Documentation

15 of 32

PUMA Match Summary by Large Place (75k+)ç

16 of 32

Create a customized dataset via website

17 of 32

Use the online tabulator

18 of 32

Use our API or R/Python packages

19 of 32

Read the Documentation

  • Harmonizing data…
    • makes it easier to look at broadly comparable concepts/variables across time/space
    • makes it easier to make a mistake
  • IPUMS documents changes to…
    • response categories, universe definitions, question phrasing

20 of 32

Weights in Microdata

  • Not everyone included in the data file has the same probability of being selected into the sample
  • Weights account for how many people in the total population each observation in the sample population represents
  • https://blog.popdata.org/ipums-faqs-sample-weights/

21 of 32

Small Sample Sizes

  • Microdata allow for identification of (highly) specific subgroups; these groups can be “too small”
  • There is no singular threshold for “too small”
  • Sampling error around estimated statistics may be too large to allow for meaningful interpretation
  • Options: pooling multiple samples, aggregating categories

22 of 32

23 of 32

24 of 32

25 of 32

26 of 32

27 of 32

28 of 32

29 of 32

“Between 2018 and 2022, the share of households with annual incomes of more than $750,000 that rented rose to 10.5%, according to census data from IPUMS at the University of Minnesota analyzed by The Wall Street Journal, the highest level since the survey began in the mid-2000s. It was 8.4% in the previous five-year period.”

30 of 32

31 of 32

32 of 32

Questions?

Please go up to the microphone to ask a question. Thank you.

Sandhya Kambhampati, sandhya@latimes.com

Paul Overberg, poverberg2@gmail.com

David Van Riper, vanriper@umn.edu

MaryJo Webster, maryjo.webster@startribune.com