1 of 24

Making your own summary tables

Welcome to the microdata, where everyone is data!

2 of 24

A quick guided tour of the history of the Census data

A census taker counts household members in 1850

In this picture, a clerk uses one of Herman Hollerith's electric tabulating machines to process census data in 1890.

3 of 24

It’s better

Source: Census/Pew Research Center

4 of 24

It’s better

Source: Census/Pew Research Center

5 of 24

It’s complicated

6 of 24

It’s complicated

Data is not the same over time: each questionnaire is informed and adjusted with previous and current issues in mind

One big job for a data journalist: making sense of the ways in which researchers thought about data and surveys over time and find responsible ways to make sense of it even if it’s not easy

All the terms are here: https://docs.google.com/spreadsheets/d/1KMXnlx6VLAkfctCO5Y_Zm-Q7oqV6KDXeqOatKuiT9xE/edit?usp=sharing

7 of 24

Microdata

👆 raw microdata sample

Yuck!

What is that?!

8 of 24

Microdata

Microdata allows you to make your own tables!

https://dl.dropboxusercontent.com/u/100051277/italian_barbers.pdf

Census didn’t have what you wanted? No problem: make it yourself!

9 of 24

Microdata

IPUMS

PUMS = “The American Community Survey (ACS) Public Use Microdata Sample (PUMS) files are a set of untabulated records about individual people or housing units.” — Census

IPUMS = Integrated Public Use Microdata Series (census microdata for social and economic research) — IPUMS

10 of 24

Microdata

11 of 24

Statistical concepts

  • Correlation between two variables: for example, how having good teachers might be “associated with” (a phrase often used by academics) better outcomes later in life; or how the weight of a car is associated with fatal collisions. �
    • If two variables move in one direction, it’s a positive correlation�

When x goes up, so does y

    • If two variables move in opposite directions, it’s a negative correlation ��When x goes up, y goes down
  • Causation: that one of the two variables drives the other; this is the ultimate goal and often follows finding correlations (“Correlation is not necessarily causation”). Causation flows in only one direction (a.k.a. cause and effect)��When x goes up, it causes y to go down ��

12 of 24

Statistical concepts

  • Descriptive statistics: Observed data, i.e. tuff you gathered via surveys or other methods.��“From 2000 to 2005, 70% of the land cleared in the Amazon and recorded in Brazilian government data was transformed into pasture.”
  • Inferential statistics: Use observed data to predict what is true of areas beyond the data. Stuff you calculated from observed data.��“Receiving your college degree increases your lifetime earnings by 50%”

13 of 24

Statistical concepts

  • Weighting: Respondents to sample surveys can be given a specific weight “that tells you how many people they are representing. The weights are designed to “true up” the sample so that even if it includes slightly fewer elderly Hispanics, say, those that are included can get a higher weight so the elderly Hispanic population is more fully represented.” (From Source)�

14 of 24

Statistical concepts

15 of 24

IPUMS online search

Microdata exercise!

Our task: According to the BBC, 51% of all nail salon technicians are of Vietnamese descent. For your nut graf you wan to find out how many Vietnamese people are nail technicians over time Let’s see how that trend developed: http://www.bbc.com/news/magazine-32544343

16 of 24

IPUMS online search

0

Conceptually prep your spreadsheet: Start with the what you want your result table to look like and work backward.

Columns: years

Rows: birthplace

Filters: All manicurists since 1950

My ideal table

17 of 24

IPUMS online search

1

Sample selection page:

https://usa.ipums.org/usa/sda/

Select United States, 1850-2014” (see below)

18 of 24

IPUMS online search

2

In another tab go to the data dictionary page:

https://usa.ipums.org/usa-action/variables/group

If you go to Person > Work IPUMS will display a range of categories; years and surveys available. Basically, each category represents a question related to the subject you picked.

Find the code for occupations:

OCC1950 = Occupation, 1950 basis (IPUMS harmonized professions over time and has this going back all the way to 1950. If you only want a snapshot in time, OCC1990 or OCC2010 might work just find for ya).

Click on it, then go to codes. You should find a full list of professions you can search for. The closest to nail salon workers we can find is Barbers, beauticians, and manicurists (code = 740).

19 of 24

IPUMS online search

3

Open this page in a third tab:

https://sda.usa.ipums.org/all_usa_samples/Doc/nes.htm

Go to Alphabetical Variable List.

It shows EVERY CATEGORY (dictionary item) and you can choose to display those categories as an alphabetical list.

Find the birthplace code (bpl) and the census year code (year)

20 of 24

IPUMS online search

4

Let’s bring it all together and make a table!

21 of 24

IPUMS online search

4

bpl for birthplace

year for census year

Then for filter, type in this:

occ1950(740) year(1950,1960,1970,1980,1990,2000,2010)

That filters your table to occupation code 740 (barbers, beauticians, and manicurists) and for those specific decades.

Make sure these are checked

22 of 24

IPUMS online search

5

IPUMS spits out this ugly HTML table

23 of 24

IPUMS online search

5

Some explanations:

What percentage of the whole column does this value represent.

Our best guess for this percentage is 2.4% but we’re 95% sure that the value lies between 1.9% and 2.9%.

Number of people who in the year said they were barbers, beauticians, and/or manicurists

24 of 24

IPUMS online search

6

You can copy and paste that ugly HTML table into spreadsheets.

See example file:

https://docs.google.com/spreadsheets/d/1kFClouCSPVN7gSNWH8do1W26yK7V2B3uIhw1Li83qT4/edit?usp=sharing