@EvoMRI @readermeter

Daniel Mietchen Dario Taraborelli

Wikidata and Wikibase as global platforms for democratizing data publishing

SciDataCon 2018 • Gaborone, 6 November 2018

University of Virginia Wikimedia Foundation

Two main avenues for

democratizing data publishing

A knowledge graph that anyone can edit and query in their own language

A Wikidata-compatible graph database that anyone can set up and federate

Wilkinson et al. (2016) doi.org/10.1038/sdata.2016.18 [image: fosteropenscience.eu CC0]

FAIR data platforms

Wikidata is to data what Wikipedia is to text

  • All data CC0
  • Anybody can contribute
  • Covers all domains of knowledge
  • Fully version controlled and collaborative
  • Integrated with the semantic web via RDF / open APIs
  • High performance query engine (SPARQL)
  • Stable. Not tied to short-term funding cycles
  • Actively developed, full stack is open source
  • Active community. Fastest growing Wikimedia project

550M statements • 760M edits

[as of October 2018]

50 million entities

Types of content in Wikidata

  • People
  • Places
  • Taxa
  • Buildings
  • Organisations
  • Artworks
  • Events
  • Astronomical Bodies


  • Chemicals
  • Processes
  • Theorems
  • Concepts
  • Creative works
  • Journals
  • Publishers
  • Meta-items


Wikidata for Research

Data provenance of a Wikidata statements by outlet, publisher and funder

Zika virusQ202864


has natural reservoirP1605

Aedes hensilliQ14573674TAXON

stated in • P248

Aedes hensilli as a potential vector of Chikungunya and Zika viruses • Q22330738SCIENTIFIC ARTICLE

funded by P859

Centers for Disease Control and PreventionQ583725GOVERNMENT AGENCY

published inP1433

PLOS Neglected Tropical DiseasesQ3359737SCIENTIFIC JOURNAL


Public Library of ScienceQ233358PUBLISHER

Sample of current biomedical content in Wikidata

  • All human, mouse genes and proteins (swissprot)
  • All Gene Ontology terms
  • All Human Disease Ontology terms
  • All FDA approved drugs
  • 109 reference microbial genomes

Biologists with Canadian citizenship

Institutions where Canadians got their PhD

Co-author graph of McGill-affiliated authors

Award recipients affiliated with McGill

Wikidata’s identifier mappings

From Wikidata to Wikibase

Why Wikibase?

Linked Jazz

“Started thinking about how our data could live in Wikidata and started investigating feasibility of that possibility.

But we have very esoteric project data that doesn’t seem appropriate to be in Wikidata so begain looking at our own Wikibase instance.”

Matt Miller (2018) Linked Jazz and Wikibase

What’s Wikibase

  • Wikibase Repository - MediaWiki extension for structured, non-relational data in a central, collaboratively managed repository.
    • “writing RDF”
    • Revision control
    • FAIR by default
  • Wikibase Client - MediaWiki extension for retrieving and embedding structured data from a central repository into a client wiki.
  • Query Service that allows to query the contents of a Wikibase installation using SPARQL
  • A set of reusable components that provide a foundation for tasks in the same domain.

Data formats in Wikibase

(versus wikitext)

A French recording of the word “Canada”

From Lingua Libre

Letters sent by Illuminati

From FactGrid’s SPARQL endpoint

Colour indicates author

Timeline of software repositories

[SPARQL query] on Wikidata

Timeline of Wikibase instances

[SPARQL query] on the Wikibase registry

Wikibase and software repositories

Combined [SPARQL query]

across Wikidata and the Wikibase registry

Further notes

  • Federation is possible to and from any SPARQL endpoint, not just Wikibase ones (and works fine on mobile)
  • A Wikibase instance also has a Mediawiki API
  • Docker container available to install Wikibase
  • Wikimedia Commons is moving to Wikibase
  • Ecosystem of Wikidata tools, some being adapted to generic Wikibase instances
  • Work on using Shape Expressions to share data models across instances
  • Coordination through series of workshops
  • Various non-public tests, e.g. at OCLC

Wikidata or Wikibase(s)?

Wikidata community













ID mappings





Another approach to democratization of data curation:

citizen science happening on Wikimedia projects

see SciDataCon poster 150

Thank you

growth by Fabio Rinaldi [CC BY], research by Minnie Pigeon [CC BY], �graph by Icon Lauk [CC BY] from the Noun Project

Slides mashed up with contributions by �Andy Mabbett and Andra Waagmeester

These slides are adapted from

D. Mietchen, D.Taraborelli (2018) Wikidata, Wikibase, and a federated ecosystem of structured knowledge for open science. FORCE 2018�doi.org/10.6084/m9.figshare.7195358 [CC BY]