1 of 25

Migrating Freebase to Wikidata

Denny Vrandečić,1 Thomas Pellissier-Tanon,1 �Sebastian Schaffert,2 Thomas Steiner3

1 Google, San Francisco, United States

{vrandecic,thomaspt}@google.com @vrandezo @Tpt93

2 Google, Zurich, Switzerland

schaffert@google.com @SSchaffert

3 Google, Hamburg, Germany

tomac@google.com @tomayac

2 of 25

Freebase

  • Large collaborative knowledge base consisting of data composed mainly by its community members.�
  • Online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions.�
  • Initially developed by Metaweb, acquired by Google. Google's Knowledge Graph is powered in part by Freebase.�
  • Freebase data is freely available for commercial and non-commercial use under a CC BY License.

Google Confidential and Proprietary

3 of 25

Wikidata

  • Large collaborative knowledge base consisting of data composed mainly by its community members.�
  • Online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions.�
  • A project of the Wikimedia Foundation.�
  • Wikidata data is freely available for commercial and non-commercial use under a CC0 License.

Google Confidential and Proprietary

4 of 25

Structured Data

  • Machine-readable data following a data model. For example, Resource Description Framework (RDF):��<London> <hasPopulation> 8,173,900�<subject> <predicate> <object>
  • Wikidata’s data model:

Google Confidential and Proprietary

5 of 25

Freebase Homepage

Google Confidential and Proprietary

6 of 25

Freebase Item Page

Google Confidential and Proprietary

7 of 25

Wikidata Homepage

Google Confidential and Proprietary

8 of 25

Wikidata Item Page

Google Confidential and Proprietary

9 of 25

Freebase Shutdown

Google Confidential and Proprietary

10 of 25

Migration

Google Confidential and Proprietary

11 of 25

Primary Sources Tool

Objective: add proper sources (references) to statements:

<subject> <predicate> <object> ⇒ according to <source>

<Hamburg> <head of government> <Olaf Scholz> ⇒ according to ⏎

<http://www.olafscholz.hamburg/main>

Google Confidential and Proprietary

12 of 25

Architecture

  • A script running within Wikidata item pages checks if Freebase has additional data for the item in question.�
  • User (dis)approves suggested data and proposed source(s).

Google Confidential and Proprietary

“Hey Freebase! Do you have data about Q1055 (Hamburg) that I haven’t?”

“Hey Wikidata! Yeah, I know that Q1055 has a foo whose value is bar, according to baz.”

<Q1055> <foo> <bar> ⇒ according to <baz>

Knowledge Vault

for sources

13 of 25

Knowledge Vault: Facts from Unstructured Text

Google Confidential and Proprietary

14 of 25

Creating a User Script

Google Confidential and Proprietary

common.js file for programming logic: https://en.wikipedia.org/wiki/Special:MyPage/common.js

Giving Wikipedia/Wikidata a code.talks visual makeover

15 of 25

Creating a User Script

Google Confidential and Proprietary

Creating an xkcd #37 Moment on Wikipedia…

16 of 25

Primary Sources Tool

Google Confidential and Proprietary

17 of 25

Primary Sources Tool — Back-end

Google Confidential and Proprietary

18 of 25

Primary Sources Tool — Front-end

Google Confidential and Proprietary

19 of 25

MediaWiki API

Google Confidential and Proprietary

20 of 25

MediaWiki API — Getting Data

Google Confidential and Proprietary

// https://www.wikidata.org/w/api.php?action=help&modules=wbgetclaims

function getClaims(subject, predicate, callback) {

var api = new mw.Api();

api.get({

action: 'wbgetclaims',

entity: subject,

property: predicate

}).done(function(data) {

return callback(null, data.claims[predicate] || []);

}).fail(function(error) {

return callback(error);

});

}

21 of 25

MediaWiki API — Editing Data

Google Confidential and Proprietary

// https://www.wikidata.org/w/api.php?action=help

// &modules=wbsetreference

function createClaimWithReference(subject, predicate, object,

qualifiers, sourceSnaks) {

var value = (tsvValueToJson(object)).value;

var api = new mw.Api();

return createClaim(subject, predicate, object, qualifiers)

.then(function(data) {

return api.post({

action: 'wbsetreference',

statement: data.claim.id,

snaks: JSON.stringify(formatSourceForSave(sourceSnaks)),

token: mw.user.tokens.get('editToken'),

summary: WIKIDATA_API_COMMENT

});

});

}

22 of 25

Primary Sources Tool Gadget (Grown-up User Script)

Google Confidential and Proprietary

23 of 25

Problem: Pages Based on Wikipedia Contents

Google Confidential and Proprietary

24 of 25

Live Demo

Google Confidential and Proprietary

25 of 25

Thanks

Google Confidential and Proprietary