1 of 33

@EvoMRI @readermeter

Daniel Mietchen Dario Taraborelli

Wikidata and Wikibase as global platforms for democratizing data publishing

SciDataCon 2018 • Gaborone, 6 November 2018

University of Virginia Wikimedia Foundation

2 of 33

Two main avenues for

democratizing data publishing

A knowledge graph that anyone can edit and query in their own language

A Wikidata-compatible graph database that anyone can set up and federate

3 of 33

Wilkinson et al. (2016) doi.org/10.1038/sdata.2016.18 [image: fosteropenscience.eu CC0]

FAIR data platforms

4 of 33

5 of 33

Wikidata is to data what Wikipedia is to text

  • All data CC0
  • Anybody can contribute
  • Covers all domains of knowledge
  • Fully version controlled and collaborative
  • Integrated with the semantic web via RDF / open APIs
  • High performance query engine (SPARQL)
  • Stable. Not tied to short-term funding cycles
  • Actively developed, full stack is open source
  • Active community. Fastest growing Wikimedia project

6 of 33

550M statements • 760M edits

[as of October 2018]

50 million entities

7 of 33

Types of content in Wikidata

  • People
  • Places
  • Taxa
  • Buildings
  • Organisations
  • Artworks
  • Events
  • Astronomical Bodies

...

  • Chemicals
  • Processes
  • Theorems
  • Concepts
  • Creative works
  • Journals
  • Publishers
  • Meta-items

...

8 of 33

Wikidata for Research

9 of 33

Data provenance of a Wikidata statements by outlet, publisher and funder

Zika virusQ202864

TAXON

has natural reservoirP1605

Aedes hensilliQ14573674TAXON

stated in • P248

Aedes hensilli as a potential vector of Chikungunya and Zika viruses • Q22330738SCIENTIFIC ARTICLE

funded by P859

Centers for Disease Control and PreventionQ583725GOVERNMENT AGENCY

published inP1433

PLOS Neglected Tropical DiseasesQ3359737SCIENTIFIC JOURNAL

publisherP123

Public Library of ScienceQ233358PUBLISHER

10 of 33

Sample of current biomedical content in Wikidata

  • All human, mouse genes and proteins (swissprot)
  • All Gene Ontology terms
  • All Human Disease Ontology terms
  • All FDA approved drugs
  • 109 reference microbial genomes

11 of 33

Biologists with Canadian citizenship

12 of 33

Institutions where Canadians got their PhD

13 of 33

Co-author graph of McGill-affiliated authors

14 of 33

Award recipients affiliated with McGill

15 of 33

Federation

16 of 33

Wikidata’s identifier mappings

17 of 33

From Wikidata to Wikibase

18 of 33

Why Wikibase?

Linked Jazz

“Started thinking about how our data could live in Wikidata and started investigating feasibility of that possibility.

But we have very esoteric project data that doesn’t seem appropriate to be in Wikidata so begain looking at our own Wikibase instance.”

Matt Miller (2018) Linked Jazz and Wikibase

19 of 33

What’s Wikibase

  • Wikibase Repository - MediaWiki extension for structured, non-relational data in a central, collaboratively managed repository.
    • “writing RDF”
    • Revision control
    • FAIR by default
  • Wikibase Client - MediaWiki extension for retrieving and embedding structured data from a central repository into a client wiki.
  • Query Service that allows to query the contents of a Wikibase installation using SPARQL
  • A set of reusable components that provide a foundation for tasks in the same domain.

20 of 33

Data formats in Wikibase

(versus wikitext)

21 of 33

22 of 33

23 of 33

24 of 33

A French recording of the word “Canada”

From Lingua Libre

25 of 33

Letters sent by Illuminati

From FactGrid’s SPARQL endpoint

Colour indicates author

26 of 33

Timeline of software repositories

[SPARQL query] on Wikidata

27 of 33

Timeline of Wikibase instances

[SPARQL query] on the Wikibase registry

28 of 33

Wikibase and software repositories

Combined [SPARQL query]

across Wikidata and the Wikibase registry

29 of 33

Further notes

  • Federation is possible to and from any SPARQL endpoint, not just Wikibase ones (and works fine on mobile)
  • A Wikibase instance also has a Mediawiki API
  • Docker container available to install Wikibase
  • Wikimedia Commons is moving to Wikibase
  • Ecosystem of Wikidata tools, some being adapted to generic Wikibase instances
  • Work on using Shape Expressions to share data models across instances
  • Coordination through series of workshops
  • Various non-public tests, e.g. at OCLC

30 of 33

31 of 33

Wikidata or Wikibase(s)?

Wikidata community

Governance

depends

generic

Granularity

depends

CC0

Licensing

depends

stable

Funding

depends

stable

ID mappings

depends

many

Language(s)

depends

32 of 33

Another approach to democratization of data curation:

citizen science happening on Wikimedia projects

see SciDataCon poster 150

33 of 33

Thank you

growth by Fabio Rinaldi [CC BY], research by Minnie Pigeon [CC BY], �graph by Icon Lauk [CC BY] from the Noun Project

Slides mashed up with contributions by �Andy Mabbett and Andra Waagmeester

These slides are adapted from

D. Mietchen, D.Taraborelli (2018) Wikidata, Wikibase, and a federated ecosystem of structured knowledge for open science. FORCE 2018�doi.org/10.6084/m9.figshare.7195358 [CC BY]