Crunching the Numbers: Data Journalism 101
Marina Villeneuve & Kae Petrin
Slides are here: https://shorturl.at/e82pk
What is data journalism?
Data Science
Data Art
Data Storytelling!
Data journalism is another set of tools for figuring out the best way to tell a story
Why use data?
1. Data can lend credence and complexity to (or debunk) anecdote
2. It's a powerful investigative tool
3. It can communicate information efficiently and clearly
Lost in translation: Migrant kids struggle in segregated Chicago schools from Chalkbeat Chicago
Building Science Graphics: An illustrated guide to communicating science through diagrams and visualizations, by Jen Christiansen
What is data?
Some common sources
Thinking broader — what's on your beat?
Finding data for reporting
Where to look
Or… obtain it yourself through FOIA.
If someone fills out a form, that's data.
State Department of Education Websites
Good place to check first for data – there is almost always a section of the website entitled “data center”, “library”, “report card” or something similar.
Government databases
Terrible government data viz
Government dashboards
You have to interview data just like any other source.
Beware of data definitions
How to interview your data
Best practices for obtaining data
Additional tips
Checking your work
It’s all about shifting perspective
Thinking more broadly about data
Data can come in surprising forms
Learning more
Low-tech tools for data journalism
Learning more and checking your work - web resources
Books on thinking about data
Learning to code
Journalism-specific resources
Other resources
Finding the story:
Using data to report on communities, states
Data as adding context, finding stories
For example:
If you want to do a story on people living with diabetes …
find a dataset on diabetes rates by county
https://gis.cdc.gov/grasp/diabetes/DiabetesAtlas.html
And see which counties have the highest rates by capita..
And speak to people who live there
Statehouse data…
Who’s influencing who?
Lobbying data
Campaign finance contribution data
Financial disclosure statements
How are they spending our money and their campaign funds?
Budgets
Legislative reimbursements
Campaign finance expenditure data
Campaign finance data
Lobbying data: Fights to defeat, pass, tweak bills
Let’s get hands on:
With Excel!
First up…
Sorting and filtering
https://publicreporting.elections.ny.gov/CandidateCommitteeDisclosure/CandidateCommitteeDisclosure
Search by Committee
Search by Committee for:
Democratic Senate Campaign Committee - Housekeeping
Click search
Click on CSV Full Period
That will download the data for you
Don’t convert (removing leading zeroes will mess up things)
Steps:
Copy and paste data into a new worksheet (always keep an original)
Make sure the dataset is clean - columns are labeled, column headings are in first row, no empty rows, etc
Read columns to understand what data is here
On the main page: https://publicreporting.elections.ny.gov/
Look around and you’ll find a guide that explains each column
https://publicreporting.elections.ny.gov/Content/Help/FileFormatReferenceFiler.pdf
Sorting and filtering can be powerful!
Let’s sort the whole sheet by contribution amount
Go to Column Z (Amount) and click on the cell right below “Amount”
Click Control-A
Go to Sort & Filter and click Sort Largest to Smalleset
Now you can see the organizations donating the most to the NY Senate Democratic housekeeping committee
(Which is a committee that’s supposed to just be about funding the costs of a party headquarters and not for funding campaigns… but can powerful donors curry favor by donating?)
Now let’s try filtering…
Go to sort and filter, then click filter
Now you see little drop down boxes next to each column
Let’s make a Pivot Table!
Control A the entire selection you want..
Depending on which version of Excel you have, you then-
Hit OK
Now you can start summarizing and looking for trends in data
Try playing around with placing different columns in columns and rows
How might you show entities by how much they donated?
Click on the cell right below “Sum of Amount”
And then hit “Sort Largest to Smallest”
To switch to $$
Go to “home” then highlight the column you want to change
And click on the $ sign
A lot of learning data is just playing around and getting used to it.
From here on out, you’ll want to think about things like:
Resources:
https://gijn.org/resource/analyzing-data-spreadsheets/
An exercise on calculating percent change:
https://mainecampaignfinance.com/#/transactionSearch/151
Let’s look at data about how much outside groups spend on campaigns in Maine…
Check off election year 2016 and 2020
Then click export results
Let’s make a pivot table..
Put Election Year in Column
Filer Type in Row
And Sum of Amount in Values
You get a summary of how much spending by year
Formula for percentage change is
(new-old)/old
So, if we’re using cell name, it’s the cell for 2020 spending minus the 2016 spending, divided by 2016 spending.
Type in that formula and then hit enter
And then hit “%” in the Home pane to change the decimal to a %
So now we know that outside spending increased 38% from 2016 to 2020
(And look to Inflation calculator to see how much inflation rose)
And to calculate plain old percent…
Use Cell 1/Cell 2
Or, C6/D6 to find the percentage of 2020 spending of all spending in 2016 and 2020
To learn median…
Go to your spreadsheet (not the pivot table)
Go to Column P and scroll to the bottom
In the cell below the last entry in Column P, type in:
=MEDIAN(P1:P1914)
That tells Excel to calculate the median (which is kinda useless in this particular example but oh well)
You can switch out MEDIAN for AVERAGE to get that figure..
But use median for monetary amounts!
Use mean for things like.. Average cat life span.
Other resources:
Show your work on Github
https://github.com/reportermarina
For example, I did a project on school discipline data in MA
Here’s my readme, as well as links to data I used, and my methodology:
I HIGHLY recommend Columbia’s Lede Program - a data journalism certificate you can do virtually
They surprisingly have financial aid available that made it really affordable for me (someone who applied literally last minute)
I still have access to all the lectures, walkthroughs, tutorials… a god send
I also highly recommend IRE’s data Bootcamp!
https://www.ire.org/training/bootcamps/data-journalism-bootcamps/
And search for tutorials on Excel, SQL, R online
SQL: http://www.padjo.org/tutorials/
https://ksj.mit.edu/resource/data-journalism-tools/introduction/
Sources for spreadsheets;
https://datasetsearch.research.google.com/
https://nces.ed.gov/ipeds/use-the-data
Contact info�
@ReporterMarina
marina.villeneuve@gmail.com