City of Philadelphia

Open Data Guidebook

(Version 1.0.0)

Note - A more expansive version of this guide can be found at here.


Purpose

The purpose of this guidebook is to provide practical guidance to City of Philadelphia departments and agencies on the release of open data to the public. This document is a work in progress and will be updated as needed by the Chief Data Officer, the Office of Innovation and Technology (OIT) and the Open Data Working Group.

All questions should be directed to the City of Philadelphia’s Chief Data Officer at data@phila.gov.  

Contents

Why release open data?

Identifying data sets for public release

Reviewing data for completeness and accuracy

Conducting an internal data review

Adding Metadata

Adding Terms of Use

Staging your data for public use

OpenDataPhilly.org

PHLAPI

GitHub

Engaging users in the data community

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. Derived works should assign attribution to the City of Philadelphia, Office of Innovation and Technology.


Why release open data?

One of the first questions public agencies often ask when learning about open data is “Why should my agency or department release open data?” There are a number of reasons, both practical and philosophical, why releasing open data can benefit your department and the people it serves.

First and foremost, releasing open data to the public is an important Mayoral priority in the City of Philadelphia. Through Executive Order 1-12, Mayor Michael Nutter established a formal open government plan for the City and has directed city departments to identify data sets for public release. Releasing data that has been collected and/or maintained through the operation of city departments and agencies enhances the transparency of city government, and can help citizens become better informed about the operation of the city and more engaged with their elected representatives.

Releasing open data has many practical benefits for city departments. Open data releases can be an effective way of responding to requests for data through the City’s Open Records Policy. One open data release may address multiple requests for information than can be repetitive and costly to respond to if addressed on an individual basis.

Publishing open data can also help reduce unwanted web traffic on department websites, which is often the result of “data scraping” by individuals seeking to obtain data in bulk from the City. This puts unnecessary stress on the city’s technology infrastructure and unneeded burden on city IT staff.

By releasing open data, city departments may help to stimulate new and innovative ideas from our local technology community. There is great potential for open data to act as the fuel for new solutions and even new businesses that can address common problems or challenges facing those that live in, work in or travel to the City of Philadelphia.

Finally, releasing open data has the potential to generate a host of operational efficiencies for city departments. There are a number of data resources managed by city departments that could be used by other departments as part of existing or revised business processes. Combining information from a variety of sources has the potential to provide valuable insights into how our city works and how city departments may better serve those that live and work in Philadelphia.


Identifying data sets for public release

“Cities produce a great deal of data, all of which is probably interesting to somebody out there. Given our limited municipal data resources, how do we prioritize the datasets that we publish?”

-- Jim Craner, 2012 Code for America Fellow (City of Santa Cruz)

The first step in releasing open data is identifying which data sets or data resources to release. There are a number of easy ways to identify data that is suitable for release:

  1. Review existing requests for data and information.

    Take a look at requests for information received by your department or office - these may be formal Open Record requests or less formal requests sent via e-mail or some other channel. Does your department have a Facebook page or Twitter account? Check to see if there are comments or tweets about data that you might be able to make available.
  2. Review OpenDataPhilly.org.

    Take a look at the
    nomination section of this site, where people can submit ideas for new data sets, or look at the results of the recent Open Data Race for data sets that have lots of votes from non-profits and other data consumers.
  3. Check the Open Data Pipeline.

The office of the Chief Data Officer maintains a list of data sets that are currently being worked on for future release. This pipeline list contains items that have been suggested for release from a variety of different sources. If you see a data that falls under the purview of your department in the “Ideas / Suggestions” list, someone has expressed an interest to the CDO’s office in seeing this data released.

  1. Look at what other cities are doing.

    A number of other cities in the U.S. and around the world are releasing open data to external users. What are people in
    other cities looking for? What kinds of data are other governments making available? Chances are, if there is a high demand for certain kinds of data in other cities there may be a high demand for it in the City of Philadelphia as well. This is true for a variety of different kinds of data, but is especially relevant for crime, property and financial data.
  2. Ask the public for ideas.

    If you want to find out what data people are looking for from your department or agency, one of the best ways to find out is simply to ask. There is a
    public forum set up to allow city departments and others to interact with data users. Think about whether a posting to this - or some other - public forum would be a useful way to solicit feedback on data to release. Members of the public may also vote on the items in our Open Data Pipeline - discussed above - if they have a Trello account.


  1. Examine existing websites.

    Look at the existing website for your agency or department. Do they contain documents with data, like annual reports, financial statements, office locations, etc.? Does your website have lists of service locations, or a search page that lets users find a location or service close to them? Any web-based application that has a database behind it - like a search feature - is worth examining more closely for a data set that might be appropriate to release.
  2. Check for “scrapers”.

    Web scrapers are programs that are written by people that want to extract information - usually in bulk - from a website. There are many options for tools that can be used to conduct this type of extraction, but one of the most popular is ScraperWiki. Checking to see if there are
    any ScraperWiki entries for your websites is a great way to determine if some of your data has value to outside users.

The “Open Data Handbook” also has some practical tips that can help identify data sets that might be suitable for release by your department.


Reviewing data for completeness and accuracy

“Just as the reality of daily life and complex ecosystems have high levels of entropy and thus ‘dirtiness,’ so does the data that surrounds it. We can not use this as an excuse to avoid solving problems... ”

-- Brett Goldstein (Formerly CDO of the City of Chicago), Dirty Data Handbook

Every public data set varies in completeness and quality, and prior to releasing data for outside users every department should strive to ensure their data is as accurate, complete and up to date as possible. However, there is no such thing as “perfect” data and using data perfection as a prerequisite for releasing data can become an impediment.

When releasing data sets, be explicit about any limitations that were encountered in preparing it for release and add caveats that will help data consumers understand the limitations of your data if any exist. If data is subject to revision, if portions have been redacted, if it only covers a limited time period - make sure to clearly state these limitations as part of your data release. Clearly stating limitations and caveats will make your data more usable because consumers will have a more thorough understanding of what your data represents.See the discussion on the basic tenants of metadata below.

It’s also worth noting that most communities of data consumers will provide feedback on the quality of data or any perceived inaccuracies. This can be invaluable information for improving the quality of your data.

For a good example of this dynamic at work, view some of the discussions on the SEPTA Developer Google Group, or the City of Philadelphia's open data public forum.


Conducting an internal data review

“Measure twice, cut once.”

-- Good carpenters everywhere

One of the most important kinds of outreach you can undertake when releasing open data is inside your own department. Most importantly, make sure your department or agency head (or their designate) is aware of the planned data release.

Some questions to ask yourself:

  1. Has your planned data release been communicated to others in your department that may be impacted by it, or that can provide valuable insights into the structure of your data before it is released?
  2. Has your department’s legal office or representatives reviewed the data you plan to release?
  3. Are there any restrictions on how the data may be used (i.e., restrictions can apply to some kinds of financial, health care, education, and other types of data - sometimes these restrictions are imposed by other levels of government.)?
  4. Does the data you plan to release contain personally identifying information? Has you department’s legal office or representatives approved the release of such information? Will removing such information diminish the value of your data for consumers?

Having an internal dialog with others in your department about open data that is planned for release is a necessary precursor to external communication with outside data consumers (more below). This dialog will also inform the terms of use that accompany your data release.

It’s also a good idea to identify people in your department through these discussions that are knowledgeable about the data you may want to release. You’ll want to identify both administrative contacts (i.e., how often is the data updated, where can it be downloaded, etc.) and technical contacts (i.e., what does this field in your data set mean?) for your data, in the event that you will not fill these roles.


Adding Metadata

“Only if every user has a common and exact understanding of the data can it be exchanged trouble-free.”

-- ISO/IEC 11179 Metadata Reg Specification

If you don’t tell recipients of your data what it is, how it can be used, or anything about its currentness, how can we expect them to use it effectively?  

Metadata, or descriptive information about data, is common among geospatial data (i.e., GIS layers) and other data sets due to their complexity. Typically included in geospatial metadata is the following content: a descriptive summary file and its primary uses; methods of creation; changes (if the file is routinely updated); contact information; geographic references, and most critically - dates indicating for when the data was captured and/or prepared.  

The basic tenants of metadata (what the data is, how to interpret the data for use, how the data was prepared) qualify for inclusion within all digital databases be they geographic or otherwise.  How metadata is included in a data release may vary from format to format. It can be as simple as an accompanying file (e.g., readme.txt or data dictionary PDF), or an additional table in the database or tab in an Excel spreadsheet with multiple text or note fields of descriptive and contact information. Basic content should include:

  • Data description and common useage;
  • Date created or date range of content;
  • Author and contact information;
  • Descriptions of key fields and/or field codes, and;
  • Limitations, disclaimer (if appropriate) and terms of use.

An example of the metadata content proposed for use by the City’s GIS community, including a disclaimer, can be found at here. 


Adding Terms of Use

Every data release from the City of Philadelphia should be accompanied by a clear statement providing terms of use to end users. If your data release is subject to special restrictions or caveats, the terms of use should state these very clearly for users.

OIT maintains a standard terms of use statement on GitHub which can be used by an city department or agency as part of their data releases. You may use this statement as is, or use it as a starting point for developing your own terms of use document.

If you are releasing your data on GitHub - discussed more fully below - you can include this standard terms of use statement in your data release as a git submodule.

git submodule add https://github.com/CityOfPhiladelphia/terms-of-use.git license/

This will allow you to ensure that any future changes to these terms of use can be incorporated into your data release by simply doing the following from the license/ directory:

git pull


Staging your data for public use

An important step in releasing open data is determining how it will be made available. To some extent, this decision will be a function of what kind of data you want to release and the format it is currently in (i.e., how it is used or maintained internally)

One of the most common types of city data is geospatial data, or data that relates to a specific place. This may be things like the locations of points of service, like a library or police station, or of physical assets like fire hydrants or street lights. It is common for city data to have a geospatial component - the location of a crime, or a pothole that is reported to 311.

The easiest and most efficient way to make geospatial data available to outside consumers is to work through the GIS Services Group within OIT - they can best advise your department on the proper approach for releasing geospatial data.

OIT has developed an ArcGIS Desktop Python toolbox that makes exporting ArcGIS feature classes to open geodata formats (GeoJSON, KML, CSV and shapefile, compressed and uncompressed) very simple and straightforward. Information on how to install and use this tool can be found here.

If your department’s data changes somewhat infrequently, or will be accessed only periodically it may be appropriate to release such data as a static download in a commonly used format (e.g., comma separated value). However, if your data is updated frequently or subject to changes - or if it will be targeted for specific user communities like software developers - it may make sense to consider provisioning an Application Programming Interface (API) for your data.

The Chief Data Officer and the Web and Data Services Team at OIT can best advise your department on the feasibility of using an API for your data, the tradeoffs of using an API over static downloads and the pros and cons of different API platforms and frameworks.


OpenDataPhilly.org

OpenDataPhilly.org is a data directory - think of it as a “white pages for data.”  It makes your data more discoverable and easy to find for those that may be interested in using it. This site is well known in the Philadelphia technology and data communities and also contains data from a variety of different data producers from around the region.

Once your data is staged for use by consumers - whether it is a static downloadable data set or an API - contact the Chief Data Officer with the details of your release to get an entry added to OpenDataPhilly.org.

The listing for your data in OpenDataPhilly.org will be used to communicate it’s availability to data consumers and other interested parties in Philadelphia and beyond.

PHLAPI

PHLAPI is a City of Philadelphia website that provides detailed information on open data made available through Application Programming Interfaces (APIs). This site is geared toward software developers and others that are typically more advanced and experienced data users. The audience that this website serves is much more narrowly focused than OpenDataPhilly.org, and data listings on this site included detailed API documentation, code samples and helper libraries.

If your data is available through a publicly accessible API, contact the Chief Data Officer to create a new entry on the PHLAPI site.

GitHub

GitHub is a platform that is used primarily for sharing the code that makes up software solutions. However, in response to a number of important improvements being made to the GitHub platform, we are using it as an important part of our open data program - more and more of our open data is being released on GitHub because of the many benefits it provides.

The City maintains a GitHub organization that is available to any city department or agency for releasing new data sets. To be added to the City’s organization, and to obtain access to repositories for adding new (or enhancing old) data sets, contact the Chief Data Officer.


Engaging users in the data community

People don’t like a story where they know what happens (or is supposed to happen) at the end nearly as much as they like a story with potential and possibility. You can tell people how it’s going to go, or you can learn to let them co-author the story with you.”

-- Alex Hillman, Co-Founder of Indy Hall

Publishing an open data set for outside consumers isn’t the end of the process - it’s just the beginning. Philadelphia has a broad and vibrant community of different users interested in working with city government data. The members of this community, and all of the smaller communities, cliques and collectives that it is made up of, are the city’s most important asset for turning data into value. Reaching out to and actively engaging with this community is a process that will take place long after your data is initially released.

Once your data is listed on OpenDataPhilly.org, you should tell the world about it. Start by announcing your new data set in the Open Data Philly forum - this is a group comprised of people from in and around Philadelphia area that are interested in open data. Send out a press release, tweet about it, write a post about for your agency’s Facebook page or blog. You’ll be surprised how fast word will get out to people that care about your data if you take the time to advertise through common social media channels.

There are hackathons and app challenges happening in Philadelphia on an almost weekly basis - these events are ideal for announcing the release of new data and for engaging with data consumers.  Sharing the data and giving it context can help developers utilize it for informative and useful applications like the 311 Mobile App and Philly Crime Map.  The data may also be helpful when integrated with other city agencies’ decision making systems to support more efficient operations.

Once you’ve made your data discoverable on OpenDataPhilly.org, and have let people know it is available, you’ll probably start to get feedback from users. They may have questions about certain components of your data (i.e., how a specific field is labeled, or how recently data was collected).  Wherever possible, try and capture the questions are responses from data users in a way that can be shared with other users

Ideally, you should direct users to submit questions or comments to the Open Data Philly forum. This will allow others to see and learn from the dialog your agency has with it’s data users, and also save time if the same questions are asked multiple times.