1 of 34

Open Data Initiatives:�what works, what doesn't, and what you can expect

Lessons from the National Transportation Data Challenge and other examples

Michael Gat�@michaelgat

2 of 34

Who am I?

  • 20+ years in tech and project management
  • No government experience
  • Limited academic experience
  • Lots of data publishing/analytics experience
  • I speak Python in my spare time
  • Most recently focused on data
    • interest in geo-spatial
  • Came on as volunteer project manager part-way through the National Transportation Data Challenge

3 of 34

National Transportation Data Challenge

“Traffic deaths across the United States increased 14% over the last two years. In 2016, nearly 6,000 pedestrians died – the highest number in more than two decades. Data science collaboration has the potential to reverse this trend – this Challenge aims to do just that.”

“The National Transportation Data Challenge is a series of community problem-solving events, roundtables, hackathons, demonstrations, and tutorials/trainings to build and strengthen collaborative data science projects that advance transportation safety.”

4 of 34

A strong team of backers

  • Launched May 2017
  • Sponsored by 4 Regional Big Data Innovation Hubs (NSF program)
  • Support from federal and state agencies
  • Sponsorship from Google, AWS, Satori, Esri and others.

5 of 34

On paper, looks great. But...

  • In any open data initiative, a number of decisions must be made that will influence success. We got some of these right, some wrong.
    • Organization
    • Topic
    • Goals
    • Structure/Plan
    • Participation
  • How do these factors impact the success of an open data initiative?

6 of 34

But first, some background

“Open Data Initiatives” come in two flavors:

  • Data Publishing
    • Collect, aggregate, clean and publish datasets
    • Often done in phases, starting with core data, adding additional layers
    • Must have an ongoing commitment!
  • Data Use
    • Use/exploit data
    • Often involve outside contributors
    • May be time-bound or open-ended
  • These two feed on each other

7 of 34

Data Publishing

8 of 34

Data Use

9 of 34

Hybrid? (Not really!)

10 of 34

But back to the Transportation Challenge...

  • A data-use initiative
    • But that didn’t stop some people
  • Somewhat hamstrung by organizational constraints
  • Sponsor involvement/limitations were also a challenge

11 of 34

“Parent Organization”

“NSF's Directorate for Computer and Information Science and Engineering (CISE) initiated the National Network of Big Data Regional Innovation Hubs (BD Hubs) program in FY 2015. Four BD Hubs – Midwest, Northeast, South, and West – were established to foster multi-sector collaborations among academia, industry, and government, both nationally and internationally. These BD Hubs are serving a convening and coordinating role by bringing together a wide range of Big Data stakeholders in order to connect solution seekers with solution providers.”

12 of 34

Organization

  • Extraordinarily complex for an initiative with virtually no dedicated staff and limited budget
    • Four regions
    • Six areas of focus
    • Multiple sponsors
    • Desire for all the above to cross-pollinate
    • Part time contributors
  • But… Nobody representing interested populations
    • Cyclist groups
    • Pedestrian groups
    • Transit groups
    • Auto manufacturers
    • This was one of my biggest personal fails!

13 of 34

Organizational issues

  • Not designed to move quickly
    • The real world is moving faster than they are
  • Focus areas and geographies sometimes overlapped, but not always
    • A painful matrix
    • It would have worked better if this had been more tightly aligned
  • Need to avoid an apparent commercial conflicts of interest was limiting

14 of 34

Topic?

“Traffic deaths across the United States increased 14% over the last two years. In 2016, nearly 6,000 pedestrians died – the highest number in more than two decades. Data science collaboration has the potential to reverse this trend – this Challenge aims to do just that.”

“The National Transportation Data Challenge is a series of community problem-solving events, roundtables, hackathons, demonstrations, and tutorials/trainings to build and strengthen collaborative data science projects that advance transportation safety.”

15 of 34

Topic issues

  • Initiatives gain traction when the topic is timely, interesting and relevant
  • Transportation fits those criteria
  • But, everybody in the tech world is looking at some aspect of transportation! (Google, Facebook, Amazon, etc.)
  • The challenge is to be unique rather than being another “me too” traffic or public transport effort

16 of 34

Topic: Results

  • Mixed success. It’s a very broad area. Probably too broad.
  • Takeaway: For a short initiative, greater simpler focus would have helped
  • Regional or particularly timely issues were most successful
    • But how tightly they really related to the topic is debatable.
  • Successful example: Copenhagen Solutions Lab Themes

17 of 34

How does Copenhagen do it? Iteratively!

18 of 34

Goals: Too many and too vauge

  • DISTRACTED DRIVING: How might we determine if there is a distracted driving problem? How might we mitigate the issue?
  • WEATHER / EMERGENCY RESPONSE: How might we improve transportation safety in extreme weather? In cases of crises?
  • CURRICULUM DEVELOPMENT: How might we develop a ready-to-go high school or undergraduate curriculum introducing transportation safety data to build awareness and spark new contributions/solutions?�
  • BIKE / PEDESTRIAN SAFETY: How might we improve bike/pedestrian safety in our communities?
  • AUTONOMOUS OR CONNECTED VEHICLES: How might we leverage autonomous or enhanced vehicles to improve our constituents’ safety? How might we navigate technological, policy, and social considerations involved?
  • MULTIPLE DATA STREAMS: How might we utilize multiple data streams (particularly from different sectors, sources) to capture actionable insights to make a local transportation corridor more safe? Are there lessons and best practices that can be shared more broadly?

19 of 34

Issues with our “goals”

  • Too “academic” an approach
  • “Challenge” was never defined
    • Who is being challenged?
    • To do what?
    • Judged by whom?
    • With what prizes/results/recognition?
  • A topic is not a goal. A question is not a goal. Discussing an approach is not a goal.
  • Every goal should have a “business owner,” those don’t really exist at a national level.
  • We had too many possible goals -> Lack of focus

20 of 34

Initiative Structure and Plan

  • We stated what we would do… sort of
    • Scheduled a number of events, discussions, phone calls etc.
  • Had a plan
  • Defined key elements
  • Developed metrics

21 of 34

Planning: what worked, what didn’t

  • Lots of good events, but they were isolated from each other
  • Too many parallel events that we could not learn from
    • In large part because of the parallel nature of the six focus areas
  • Nobody owned cross-pollination of ideas
  • Nobody owned key elements for the national challenge
  • Key takeaway: One thing at a time, towards a shared goal
  • Successful example: Open Data Kansas City, successfully started with just a single dataset (census) upon which much more content could be built.

22 of 34

How did Kansas City do it?

23 of 34

Initiative Participation

  • If there’s no outside participation, all you’ve done is publish a website, which will not last long
  • Need participation at many levels
    • Short term contributors
    • Longer term contributors/volunteers/part time help
    • Sponsors/vendors/suppliers/event hosts
  • Getting participation requires significant ongoing outreach and publicity

24 of 34

Initiative outreach

25 of 34

Participation: what we did

  • Great sponsors
  • Good-great longer term participants
  • Struggled in to attract shorter-term contributors
    • In part because of the unclear nature of what the “challenge” was
  • Key takeaway: Understand the constituencies and design around their needs and interests.
    • They will probably not build what you thought they would build
    • Example: NYC crime/police data used to demonstrate racial bias by police, not to help improve policing!
  • Successful approach: NASA annual hackathons.

26 of 34

Final Thoughts

  • You can’t launch an initiative without a lot of thought
  • You need to have the right organization: Big Data Hubs didn’t fit a national public-private initiative
  • There needs to be a lot of visible, public, focused activity
  • There needs to be a regular feedback loop, so each event feeds into improvements to the next one.
  • There need to be metrics to evaluate all this, especially during the initiative. The transportation challenge only measured success at the end.

27 of 34

Final thoughts: A great all around example

  • Open Data Estonia
    • https://opendata.riik.ee/en
  • Open Data New Zealand
    • https://www.data.govt.nz/
  • Include:
    • Datasets
    • Toolkit
    • Case studies
    • Community events
    • Outreach
    • Social media presence
  • Both long-term initiatives and short-term projects
    • Many good examples of both types

28 of 34

Open Data ES

29 of 34

Open Data NZ

30 of 34

Open Data NZ case study: Trucking

31 of 34

Open Data NZ case study: Beer!

32 of 34

More final thoughts: Open data in the US

  • The US (Federal) government is not the optimal place to try to do open data
    • Nobody has a mandate
      • Opendata.gov run by GSA! Departments publish as they wish
    • Weaponization of data by virtually all political entities makes open, unfiltered data a difficult thing to do
    • No long term commitments
    • Difficult to involve outside and especially commercial parties
    • A professional civil service does help with things like this.
  • States, Cities and private enterprise are where it’s at
  • Commercial interests are moving faster than most government entites can.

33 of 34

Final, final thoughts

  • Enthusiasm was great
  • Everybody really believed the topic mattered
  • Everybody happily put in their own time
  • We generated useful ideas and approaches that I believe will be exploited beyond the challenge
  • We got good political exposure despite an administration that is somewhat hostile

34 of 34

Thanks, questions

SCaLE 16x

Dave Goodsmith / Datascience.com

Meredith Lee / West Big Data Hub

The Regional Big Data Innovation Hubs

Flash the Cat

Michael Gat

@michaelgat

http://www.michaelgat.com