1 of 44

Frictionless Data for Reproducible Research

Serah Rono

serah.rono@okfn.org . @serahrono

FORCE 2018. OCTOBER 10, 2018

2 of 44

Open up all essential, public interest information and see it used to create insight that drives positive change

Build communities, tools and skills to empower individuals and organizations to use open information to create insights that drive change.

3 of 44

OpenSpending

4 of 44

5 of 44

Open Data Publishing Best Practices

6 of 44

Frequency

Author

License

Contact point

Date of publication

Description

Tags

Related resources

URL

Version

Name

Methodology

Format

7 of 44

8 of 44

9 of 44

http://frictionlessdata.io/data-packages

10 of 44

11 of 44

Standards

12 of 44

13 of 44

14 of 44

Open Contracting Data Standards

  • Governments spend $9.5 trillion every year on contracts, yet very little of this information is available for public scrutiny;
  • The Open Contracting Data Standard is a community driven standard for publishing this data;
  • It models the process of contracting (Planning > Formation > Performance > Completion)
  • It includes fields that should be included in all contracting data releases and records (e.g. contract value, supplier, period, currency etc)
  • It is currently a work in progress funded by the Omidyar and Web foundations.

15 of 44

Fiscal Data Package Specification

  • Data on budgets and spending is being made available in unprecedented quantities, we need standards to help citizens work with this data;
  • Data is supplied as a CSV file. Transaction data requires the fields amount and id and recommended fields description. Aggregated data must contain information on the way it was aggregated using UN and IMF classifications;
  • The metadata is supplied in a JSON file with required and suggested fields such as name, resources (links to actual data), currency, granularity (transactional or aggregated), type (expenditure or revenue), status (proposal, approved, adjustment, execution), licenses;
  • There is documentation on how to convert your budget data into the standard using easy to use and open source tools.

16 of 44

17 of 44

Frictionless Data

18 of 44

Legal barriers

(open data, sharing agreements etc)

Data Quality

Hard to find

Interoperability

No tool integration

19 of 44

  • Introduce a significant, measurable improvement in how data is shared, consumed, and analyzed.
  • Make it easier to maintain and improve data quality.

20 of 44

21 of 44

CSV

Tool

e.g. data checking with Goodtables

Tool

e.g. import to R, SQL, BigQuery, etc

(datapackage.json + CSV)

22 of 44

Frictionless Data Lab

23 of 44

24 of 44

25 of 44

26 of 44

“Can I please get this done in Terminal?”

27 of 44

pip install goodtables

28 of 44

29 of 44

30 of 44

31 of 44

Recap

  • Data Packages and Data Package Creator
  • Data validation in your terminal with goodtables-cli
  • One-time data validation on the web with try.goodtables.io
  • Continuous data validation with goodtables.io

32 of 44

Frictionless Data

&

FAIR Research

33 of 44

Findability

  • Unique identifier 👍
  • Rich metadata 👍
  • Identifier as part of metadata 👍
  • Metadata indexed in a central searchable resource ?

34 of 44

Accessibility

  • Metadata is accessible, independent of associated data 👍
  • Metadata can be retrieved by the unique identifier 👍
    • Protocol allows for an authentication and authorization procedure
    • Protocol is open, free and can be adopted widely 👍

35 of 44

Interoperability

  • Provisions for metadata to reference other metadata 👍
  • Metadata should also be FAIR - findable, accessible, interoperable, reusable 👍
  • Metadata should use accessible and shared language 👍

36 of 44

Reusability

  • Metadata should include a wide range of relevant attributes
    • Metadata should include a data license 👍
    • Metadata should use standardised formats - that are easily understood and widely applicable across domains 👍
    • Metadata should link back to original sources (provenance) 👍

37 of 44

Frictionless Data Research

Pilots & Case Studies

38 of 44

39 of 44

Data quality and reuse of archived data

40 of 44

Packaging Energy Data

41 of 44

Validating metadata associated with ADBio repositories

42 of 44

What next?

  • Adopt our specs (frictionlessdata.io/specs)
  • use software (frictionlessdata.io/software)
  • Pilots and collaborations
  • GitHub issues and PRs
  • Join our community

43 of 44

Join our community

44 of 44

Serah Rono . @serahrono . serah.rono@okfn.org