City of Philadelphia
Open Data Guidebook
Note - A more expansive version of this guide can be found at here.
The purpose of this guidebook is to provide practical guidance to City of Philadelphia departments and agencies on the release of open data to the public. This document is a work in progress and will be updated as needed by the Chief Data Officer, the Office of Innovation and Technology (OIT) and the Open Data Working Group.
All questions should be directed to the City of Philadelphia’s Chief Data Officer at firstname.lastname@example.org.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. Derived works should assign attribution to the City of Philadelphia, Office of Innovation and Technology.
One of the first questions public agencies often ask when learning about open data is “Why should my agency or department release open data?” There are a number of reasons, both practical and philosophical, why releasing open data can benefit your department and the people it serves.
First and foremost, releasing open data to the public is an important Mayoral priority in the City of Philadelphia. Through Executive Order 1-12, Mayor Michael Nutter established a formal open government plan for the City and has directed city departments to identify data sets for public release. Releasing data that has been collected and/or maintained through the operation of city departments and agencies enhances the transparency of city government, and can help citizens become better informed about the operation of the city and more engaged with their elected representatives.
Releasing open data has many practical benefits for city departments. Open data releases can be an effective way of responding to requests for data through the City’s Open Records Policy. One open data release may address multiple requests for information than can be repetitive and costly to respond to if addressed on an individual basis.
Publishing open data can also help reduce unwanted web traffic on department websites, which is often the result of “data scraping” by individuals seeking to obtain data in bulk from the City. This puts unnecessary stress on the city’s technology infrastructure and unneeded burden on city IT staff.
By releasing open data, city departments may help to stimulate new and innovative ideas from our local technology community. There is great potential for open data to act as the fuel for new solutions and even new businesses that can address common problems or challenges facing those that live in, work in or travel to the City of Philadelphia.
Finally, releasing open data has the potential to generate a host of operational efficiencies for city departments. There are a number of data resources managed by city departments that could be used by other departments as part of existing or revised business processes. Combining information from a variety of sources has the potential to provide valuable insights into how our city works and how city departments may better serve those that live and work in Philadelphia.
“Cities produce a great deal of data, all of which is probably interesting to somebody out there. Given our limited municipal data resources, how do we prioritize the datasets that we publish?”
-- Jim Craner, 2012 Code for America Fellow (City of Santa Cruz)
The first step in releasing open data is identifying which data sets or data resources to release. There are a number of easy ways to identify data that is suitable for release:
The office of the Chief Data Officer maintains a list of data sets that are currently being worked on for future release. This pipeline list contains items that have been suggested for release from a variety of different sources. If you see a data that falls under the purview of your department in the “Ideas / Suggestions” list, someone has expressed an interest to the CDO’s office in seeing this data released.
The “Open Data Handbook” also has some practical tips that can help identify data sets that might be suitable for release by your department.
“Just as the reality of daily life and complex ecosystems have high levels of entropy and thus ‘dirtiness,’ so does the data that surrounds it. We can not use this as an excuse to avoid solving problems... ”
-- Brett Goldstein (Formerly CDO of the City of Chicago), Dirty Data Handbook
Every public data set varies in completeness and quality, and prior to releasing data for outside users every department should strive to ensure their data is as accurate, complete and up to date as possible. However, there is no such thing as “perfect” data and using data perfection as a prerequisite for releasing data can become an impediment.
When releasing data sets, be explicit about any limitations that were encountered in preparing it for release and add caveats that will help data consumers understand the limitations of your data if any exist. If data is subject to revision, if portions have been redacted, if it only covers a limited time period - make sure to clearly state these limitations as part of your data release. Clearly stating limitations and caveats will make your data more usable because consumers will have a more thorough understanding of what your data represents.See the discussion on the basic tenants of metadata below.
It’s also worth noting that most communities of data consumers will provide feedback on the quality of data or any perceived inaccuracies. This can be invaluable information for improving the quality of your data.
“Measure twice, cut once.”
-- Good carpenters everywhere
One of the most important kinds of outreach you can undertake when releasing open data is inside your own department. Most importantly, make sure your department or agency head (or their designate) is aware of the planned data release.
Some questions to ask yourself:
It’s also a good idea to identify people in your department through these discussions that are knowledgeable about the data you may want to release. You’ll want to identify both administrative contacts (i.e., how often is the data updated, where can it be downloaded, etc.) and technical contacts (i.e., what does this field in your data set mean?) for your data, in the event that you will not fill these roles.
“Only if every user has a common and exact understanding of the data can it be exchanged trouble-free.”
-- ISO/IEC 11179 Metadata Reg Specification
If you don’t tell recipients of your data what it is, how it can be used, or anything about its currentness, how can we expect them to use it effectively?
Metadata, or descriptive information about data, is common among geospatial data (i.e., GIS layers) and other data sets due to their complexity. Typically included in geospatial metadata is the following content: a descriptive summary file and its primary uses; methods of creation; changes (if the file is routinely updated); contact information; geographic references, and most critically - dates indicating for when the data was captured and/or prepared.
The basic tenants of metadata (what the data is, how to interpret the data for use, how the data was prepared) qualify for inclusion within all digital databases be they geographic or otherwise. How metadata is included in a data release may vary from format to format. It can be as simple as an accompanying file (e.g., readme.txt or data dictionary PDF), or an additional table in the database or tab in an Excel spreadsheet with multiple text or note fields of descriptive and contact information. Basic content should include:
git submodule add https://github.com/CityOfPhiladelphia/terms-of-use.git license/
An important step in releasing open data is determining how it will be made available. To some extent, this decision will be a function of what kind of data you want to release and the format it is currently in (i.e., how it is used or maintained internally)
One of the most common types of city data is geospatial data, or data that relates to a specific place. This may be things like the locations of points of service, like a library or police station, or of physical assets like fire hydrants or street lights. It is common for city data to have a geospatial component - the location of a crime, or a pothole that is reported to 311.
The easiest and most efficient way to make geospatial data available to outside consumers is to work through the GIS Services Group within OIT - they can best advise your department on the proper approach for releasing geospatial data.
OIT has developed an ArcGIS Desktop Python toolbox that makes exporting ArcGIS feature classes to open geodata formats (GeoJSON, KML, CSV and shapefile, compressed and uncompressed) very simple and straightforward. Information on how to install and use this tool can be found here.
If your department’s data changes somewhat infrequently, or will be accessed only periodically it may be appropriate to release such data as a static download in a commonly used format (e.g., comma separated value). However, if your data is updated frequently or subject to changes - or if it will be targeted for specific user communities like software developers - it may make sense to consider provisioning an Application Programming Interface (API) for your data.
The Chief Data Officer and the Web and Data Services Team at OIT can best advise your department on the feasibility of using an API for your data, the tradeoffs of using an API over static downloads and the pros and cons of different API platforms and frameworks.
OpenDataPhilly.org is a data directory - think of it as a “white pages for data.” It makes your data more discoverable and easy to find for those that may be interested in using it. This site is well known in the Philadelphia technology and data communities and also contains data from a variety of different data producers from around the region.
Once your data is staged for use by consumers - whether it is a static downloadable data set or an API - contact the Chief Data Officer with the details of your release to get an entry added to OpenDataPhilly.org.
The listing for your data in OpenDataPhilly.org will be used to communicate it’s availability to data consumers and other interested parties in Philadelphia and beyond.
PHLAPI is a City of Philadelphia website that provides detailed information on open data made available through Application Programming Interfaces (APIs). This site is geared toward software developers and others that are typically more advanced and experienced data users. The audience that this website serves is much more narrowly focused than OpenDataPhilly.org, and data listings on this site included detailed API documentation, code samples and helper libraries.
If your data is available through a publicly accessible API, contact the Chief Data Officer to create a new entry on the PHLAPI site.
GitHub is a platform that is used primarily for sharing the code that makes up software solutions. However, in response to a number of important improvements being made to the GitHub platform, we are using it as an important part of our open data program - more and more of our open data is being released on GitHub because of the many benefits it provides.
The City maintains a GitHub organization that is available to any city department or agency for releasing new data sets. To be added to the City’s organization, and to obtain access to repositories for adding new (or enhancing old) data sets, contact the Chief Data Officer.
“People don’t like a story where they know what happens (or is supposed to happen) at the end nearly as much as they like a story with potential and possibility. You can tell people how it’s going to go, or you can learn to let them co-author the story with you.”
-- Alex Hillman, Co-Founder of Indy Hall
Publishing an open data set for outside consumers isn’t the end of the process - it’s just the beginning. Philadelphia has a broad and vibrant community of different users interested in working with city government data. The members of this community, and all of the smaller communities, cliques and collectives that it is made up of, are the city’s most important asset for turning data into value. Reaching out to and actively engaging with this community is a process that will take place long after your data is initially released.
Once your data is listed on OpenDataPhilly.org, you should tell the world about it. Start by announcing your new data set in the Open Data Philly forum - this is a group comprised of people from in and around Philadelphia area that are interested in open data. Send out a press release, tweet about it, write a post about for your agency’s Facebook page or blog. You’ll be surprised how fast word will get out to people that care about your data if you take the time to advertise through common social media channels.
There are hackathons and app challenges happening in Philadelphia on an almost weekly basis - these events are ideal for announcing the release of new data and for engaging with data consumers. Sharing the data and giving it context can help developers utilize it for informative and useful applications like the 311 Mobile App and Philly Crime Map. The data may also be helpful when integrated with other city agencies’ decision making systems to support more efficient operations.
Once you’ve made your data discoverable on OpenDataPhilly.org, and have let people know it is available, you’ll probably start to get feedback from users. They may have questions about certain components of your data (i.e., how a specific field is labeled, or how recently data was collected). Wherever possible, try and capture the questions are responses from data users in a way that can be shared with other users
Ideally, you should direct users to submit questions or comments to the Open Data Philly forum. This will allow others to see and learn from the dialog your agency has with it’s data users, and also save time if the same questions are asked multiple times.