Data Visualization Under Deadline
A workshop for the The Florida Times-Union newsroom
Nov. 28-29, 2012
Contact: Carl V. Lewis (@carlvlewis)
Tuesday Nov. 28
On Tuesday, we'll look at the principles, technicalities, best practices and tools of data-driven storytelling.
Introduction to Data Journalism - 1:00-1:50 p.m.
This session will introduce the basic concepts and abstract principles of the nascent field of data journalism. We'll answer questions such as:
- What is data journalism?
- How is data journalism different from computer-assisted reporting?
- Why is it important for journalists to be data-savvy? (hint)
- What makes data visualization different from infographics and illustrations? How is data journalism the “new punk’?
- How can journalists with minimal programming experience implement data visualization into their workflow?
We'll also critique a few classic examples of data visualization produced by news organizations, paying particular attention to their design choices, underlying technologies and data sources.
2. Mining for Public Data - 2:00-2:50 p.m.
This session will outline the first two steps to any data journalism project: research and retrieval. From accounting for the various factors that may influence your topic, to locating relevant data sources in today's Big Data ecosystem, we'll talk about how to kickstart your data project and assimilate all the necessary ingredients to begin the process of visualization. Some basic topics we'll touch on include:
- Sources for federal, state and local public data (Georgia, Florida; Duval County).
- Search tips for locating hidden datasets on the Web.
- Dealing with data formats (.csv, .txt, .xls, XML, KML, JSON, .shp, .RSS, pdf).
- Scraping data from hard-to-reach places (HTML tables, PDFs; Haystax, ScraperWiki, CometDocs).
- A brief overview of using APIs to access data (application programming interfaces).
3. Analyzing and Refining Your Datasets - 3:00-3:50 p.m.
In this session, we'll cover how to analyze raw datasets, clean messy data and distill complex data structures into a format in which they can be more easily visualized. We'll go over:
- Basic spreadsheet software for formatting and analysis (Excel, Google Spreadsheets).
- Data numeracy – mean, median, outliers, skewed and normal distributions, z-indexing –– and why these terms matters to data journalism.
- Handy Excel formulas for data journalists to keep in mind (concatenate, sum, percent change, calculating a z-index).
- Data cleaning using macros with Google Refine.
- Converting data to more useful formats (.shp to .kml; ..xls to HTML, XML to JSON; Mr. Data Converter).
- Merging multiple datasets.
- Primary data classifications (integer, real, string, boolean).
4. From Data to Visualization - 4:00 - 4:50 p.m.
During our final session Tuesday, we'll preview the multitude of free, open-source methods for visualizing public data. We'll touch on:
- Picking the right type of visualization for the task at hand (line chart vs. bar charts, chloropleth maps, bubbletrees, stepper graphics, porcupine feverlines, timelines)
- Design principles and best-practices for visualization (color, grid, typography, visual overload, do's and don'ts)
- Wireframing and prototyping
- The possibilities and practicalities of open-source tools and libraries (Google Visualization API, Google Fusion Tables, CartoDB, Highcharts, d3.js, Leaflet, Tableau Public).
Wednesday, Nov. 29
On Wednesday, we'll put the previous day's data journalism concepts and technical lessons into action by creating a number of different real-world visualizations based upon Florida and Georgia public data.
Building a Web app - 9:00-9:50 a.m.
Before we can bring our data journalism project to life, we'll need to have somewhere to store our project on the Web for testing and/or hosting. During this session, we'll FTP into server space where we'll have a place to test our web apps and visualizations for the time being (I'll also show you how to get your own server space for later). We'll briefly look at:
- Basic HTML and CSS syntax.
- Setting up a simple 960px page to embed our visualizations.
- Using iFrames to embed visualizations.
- Generating an HTML table from an Excel spreadsheet (Tableizer)
II. Maps - 10:00-10:50 a.m.
We'll dig deep into the best-practices for creating interactive maps and other cartographic projections. A few of the feats we'll accomplish:
- Geocoding street address data into lat/long coordinates for point mapping (Batch Geocode).
- Building a basic chloropleth map using Google Fusion Tables.
- Adding legends, map styles, search bars and custom infoWindows.
- Building unique, hover-capable interactive maps using CartoDB + Leaflet.
- Overlaying a bubble chart onto a Google Map.
- Using QGis to bind geographic and numeric data.
- Downloading polygon outlines.
- Converting .shp to .kml (Shape to Fusion).
- Drawing custom .kml outlines.
- Proportional maps and Geo maps.
- Setting an accurate color ramp using ColorBrewer.
III. Charts - 11:00-11:50 a.m.
Interactive charts are one of the simplest and most effective ways to bring numeric data to life, but they should be treated with care. During this session, we'll cover:
- Best practices for static and dynamic charting – scales, increments, axes, legends, source lines.
- Testing embeddable, interactive charts on the fly in Google Code Playground.
- WYSIWYG charting tools such as infogr.am and Google Drive charts.
- Building complex multi-component interactive charts in Tableau Public.
IV. Programming and Beyond - 1:00-2:00 p.m.
- How to create an interactive bubbletree out of public spending records and other hierarchical data (BubbletreeJS).
- Making an interactive timeline out of a .CSV list of dates and events (TimelineJS).
- Using Raphael.js to simplify the creation of HTML5-powered animations.
- jQuery plugins for slideshows, galleries and more.
- How Adobe Edge is democratizing HTML5 development.
- Other boundary-pushing data visualization tools – Processing!, Gephi, D3, ManyEyes, etc.
- Forking data visualization tools and packages off of Github.
- Platform-agnostic development principles (mobile-first, adaptive, etc.).
- General design strategies for user interaction.
- Ongoing data journalism resources and communities.
*Actual session times will likely run shorter depending on participants’ existing skill sets.