1 of 11

Linked Data, Open Data and Big Data: Understanding the need for all three

Mark Birbeck, Engine House

2 of 11

Me

Drank (drinking?) SemWeb Kool-Aid

3 of 11

RDF-related Projects

  • jobs ontology based on RDFa;
  • ontology and publishing system for NHS National Innovation Centre;
  • ontology and publishing system on UK businesses, initially based on research from Southampton University

4 of 11

What we found

  • usually no data available;
  • stuff that is available is usually in spreadsheets;
  • datasets not linked, e.g., Local Authority spend data doesn't get you back to Companies House information.

5 of 11

What we did

  • wasted time imagining that the data would be published as RDF;
  • proposed a URI format for UK businesses to make linking easier;
  • ...drank more Kool-Aid.

6 of 11

Lessons: Culture

Big cultural changes are needed just to get Open Data, especially when people are worried about their jobs or even departments.

7 of 11

Lessons: Formats

Don't need to wait for RDF: spreadsheets are not that big a problem -- we're programmers after all.

Far bigger issue was that data related to different time-frames, spatial references, etc.

8 of 11

Lessons: Joining Points

Difficult to 'join' data programmatically, e.g., names of companies. Linked Data would be great, but just consistent codes would be enough. (For example, using VAT numbers in Local Authority spend data.)

9 of 11

Big Data

There's lots of hype and marketing....true...but important lessons from world of Big Data:

  • puts the problem of processing large quantities of data to the fore;
  • makes you think about collecting data whilst doing something else;
  • something happens when processing large quantities of data;
  • can do joins with processing if have consistency.

10 of 11

An Approach

SemWeb useful for:

  • interchange format;
  • ontologies;
  • Linked Data, i.e., URIs for everything.

But Open Data doesn't need to be RDF -- use context to understand tables, for example.

And you can get a long way with Big Data-style processing.

11 of 11

End Stuff

mark.birbeck@engine.house

http://markbirbeck.com

@markbirbeck

http://uk.linkedin.com/in/markbirbeck/