Notes from Open Standards Session IODC16

BA Bill Anderson

BB Brian Banks

BH Bjorn Hagstrom

BL Beata Lisowska

DB Darren Barnes

DL Deirdre Lee (deirdre@derilinx.com)

GG Gaurav Godhwani

HL Heather Leson

IM Irum Maqsood

JD Joonas Dukpa

JLM Jose Luis Marin

NT Natasha Thamali

PA Phil Ashlock

PC Pyrou Chung

UDM Ulrika Domellof Mattsson

IM: In Canada they published guidelines for use of open standards.

BB: How can you make sure that guidelines can be updated?

IM: Hoping to put them on GitHub, working with OGP, they set up a document. We looking at data standards and data quality

JLM: Did you consider the kind of things we discussed, having the right field, etc.

IM: yes, we have specific guidelines, for example with open contracting this was very important, we recommend ISO, etc.

JLM: many times csv/xml can be just a container, there’s no definition what’s in there

IM: yep, another thing is acronyms, there are a ton

PA: we also use DCAT, we strongly type them, and then a central validation process, we check how each agency is performing, we provide support to different agencies on how to implement them. We have common approach to measure against. We have to be proactive about providing support

BA: how one fits the technical and political part together is important. Portals are similar and standardised, but sometime there are different standards. Is there not a common ground that we can all agree on, sign up to?

PA: I think we do that with DCAT

UDH: we also sign up with DCAT, right now the open data portal is with the national archives, so we hope there will be more work that will be used. It is still difficult to implement an open standard, once you decide to use it

JS: there are approaches, people that don’t understand the standards

BH: we use the DCAT-AP templates to lower the barrier

PA: all the vendors that operate in the US all use DCAT JKAN, etc, 60 different local authorities use DCAT, we used to provide syndication to data.gov with Socrata. Technology not important, but have to use DCAT in order to be syndicated. Data.gov use JSON-DCAT

BL: to improve the discoverability of data, should the datasets be interoperable

PC: whatever standards we use they should be machine-readable, because its challenging for us to get data from other sources, we’re aggregators, we’re taking data about hydro dams, every dam has a different id, different name and 4 different translations in the same country.

BH: that leads to discovery, that’s the problem, I want to use standards, where are they?

BB: the language is important point, that’s the interesting point, want to have the linkages, how to build bridges across the standards. We’re creating new silos

JL: should make a list of important datasets, reference data, repositories of authoritative datasets. Only governments can do that.

PA: we have open registers

DB: they are struggling on that though, register of registers. Struggling to work with public bodies, because if you want that someone has to maintain it.

GG: more complex because we are dealing with so many languages. We don’t have a fiscal metadata standard. Is anyone working in that area? 2 problems

BH: contracting standard

BA: openbudgets.eu

JL: XBRL? Many of these standards are over engineered

HL: I just spent two months living in the middle east, and I think we need to drop open, we have to be open and flexible about standards. We should have to look at data sharing. Looking at new ways to start conversation with people, how to present, and that can be used to move towards standards that can be used:

GG: do you use?

PA: we use DCAT, but we expand that, for example, for APIs, there are Swagger, rabble. We’ll probably be using it to define csv data dictionary too

IM: We’ll probably be using it to define csv data dictionary too. We looked at DCAT data dict, it didn’t meet our needs.

NT: how to go about discovering existing data

DB: there’s something going on…

PA: there are initiatives to look at what data is available, e.g. to tell orgs you already publish this. There are also initiatives that require people to provide an inventory of data that exists, also looking at what FOI requests have come in

NT: talking to people to data in the rail data, but getting pushback and they’re seeing what it’s sensitive data. What’s sensitive data

BH: sensitive means they don’t want to release it

HL: well that depends

BB: there’s a difference between the humanitarian or business domains. With us, there is a reluctance to publish personal phone numbers, anything that will link to individual

BH: Secrecy, personal information or third-party copyright. If it’s not one of these, there is no reason

HL: we need to understand the hesitancy, there is a socialisation piece, there is different kinds of negotiation, not just here’s why, but maybe example from your peer

NT: going back to discovery piece, how do you discover it if you

DB: the list of data can be released, even if it is not going to be published as open data

PA: we’re doing that right now

IM: we are going through complete inventory of all data, if data is not released, why is it not released

BB: controlled data, not just open, but what about registration, the conditionality, we want o track analytics. Is that usual?

BH: I hear that a lot, but I tell them it doesn’t work . if data is free, it can be shared. The only benefit is if the API is overused

DB: we have registration on an API exactly for that reason, if someone hammers it. How can we track usage?

BB: we have an optional registration page, it’s a hallway, we get some feedback. We will get some info from that, we

DB: why do you want to know who uses it? Because it’s so broad

PA: we’ve had a lot of discussions around that, any impedance is a barrier, we only have a couple of APIs with registration, we’ve also added a feedback form. But now we’re encouraging citation, because that is

DL: data usage vocabulary

BA: I like the idea of open metadata for closed datasets.

PA: it’s time to change the language for open data to data management

BA: in Africa open data is a great tool for neoliberals, if the data is open, it won’t be African companies using it

BH: talk to your users

PC: that’s always biased thought

HL: there are other ways, community roundtables on how to engage your users, gather qualitative data, thinking of different ways to target, not everyone will be into technical solution

NT: is there a sector that uses open data in an ideal kind of way that we can use to apply to our work

PA: transport is a good example, it’s good to talk to

DB: try transport for London

JD: building open data portal. There is no visibility on the open data standards that are used, open API standards, there’s no domain specific.

PA: there’s an effort in US and Canada, there Thinking less of an inventory of data standards, but not just an inventory, the US data federation. Methodology, interested in looking at maturity of standards.

DL: LOV, also says who uses the vocabulary. http://lov.okfn.org/dataset/lov/

BA: such a wide tablet

DL: technical framework, very simple, and top down policy

BA: for budget spend, they keep changing columns

DB: we need more detail on what ‘machine-readable’ means

IM: that’s why we went deeper than that, versioning, provenance, data quality, there is a data quality standard, focus is now on reusability and quality

DB: its about reusability

DL: that’s by bps cover wide range, provenance, quality, etc

IM: make it simple

New guy: one thing about metadata is about persistence, how long will the data be there

BB: is SLAs common practice?

IM: we have frequency, we don’t have ‘never’ datasets, has to be daily/weekly/annual etc. unless there is historical reason. But they are edge cases

JD: data publishers aren’t thinking of users always. They can just delete/remove services. Goes back to how to measure usability

PA: for archiving we look at how to have distributed file store, for political unrest, natural hazard

DL: are we collaborating enough with other communities – geospatial, statistics, library,

UDM: It’s important, especially it can be linked at a legal level, PSI directive

BA: governments are good, but large global institutions are bad at it, e.g. un. World bank, etc.

UDM: reference data is also provided

BB: what is grass roots standards?

GG: risk assessments for opening data

JD: political standards

DB: don’t want to adopt new standards, let’s use what we have

HL: we fork standards to be able to serves needs from. All of us have the responsibility to reuse

BA: it’s not our business to create a new standard, up to consortiums

PA: grassroots efforts are more focused on domain space, the community of practice that has the need, but they don’t have experience of how to create standards. There’s a mismatch. There can be agility, e.g. via GitHub.

DL: do we need standard bodies?

BB: there’s a mix. We didn’t know where to start. We needed the building blocks for org, geo, location, etc. and we built the niche part. There were huge barriers to working with standards bodies, we didn’t think it was possible. But there is value to domain-specific standards. The standardisation bodies should tell how to bring in new standards. Making it more sustainable. Persistent, governance, membership model. This isn’t always possible without the standards bodies. Is there always a needs for grassroots standards that have been created ad-hoc?

PC: I think it’s laziness. The other institutions don’t care about standards; they don’t have one.

JD: started at low-level, technical, but with IATI, the easier it becomes, they have resources for the outreach and communication

JL: This is a conversation I often have with the orgs I work with. Then talk to the tech providers. Governments aren’t free to implement the standards; they use private software.

DB: that’s true, move towards open y default, Siloed standards can be built in government because we have budget we need to spend.

JL: Roadblocks if data is not open, governments just say whatever

IM: it has to be the government to make those rules, moving forward, everything in procurement must be open data, if you build/buy software. It is default, federal

JL: needs a lot of political will

HL: privacy by design helped with that

UDM: they sell standards to licences. If you want to create standard, you have to be a member, in Sweden it’s pointed to as this is the standard body that should be used

PA: incorporation by reference, if a standard is referenced in legislation, then it has to be free/open

BL: rich countries have lots of open standards, etc. but how about less developed countries? Approaches should be applicable to them too

HL: there should be graduation level, e.g. in Qatar they only publish data on certain social groups, but how to collaborate with orgs. 197 countries designed

PA: sustainable development goals are interesting guideline for these