1 of 18

Linked Jazz & Wikibase

Matt Miller

Semlab at Pratt Institute, semlab.io

@thisismmiller

2 of 18

The Semantic Lab

D

F

In conjunction with

Drawings of the Florentine Painters

DADAlytics

Grant-funded through

Linked Jazz

3 of 18

Linked Jazz

Uses oral history transcripts to build an RDF based social network of Jazz related entities. Check out linkedjazz.org

4 of 18

History with Wikimedia

  • LJ project used DBpedia originally to facilitate entity disambiguation and extend the controlled names outside of bibliographic authorities.
  • Began using Wikidata for that same purpose as the system and dataset grew.
  • Started thinking about our how our data could live in Wikidata and started investigating feasibility of that possibility.
  • But we have very esoteric project data that doesn’t seem appropriate to be in Wikidata so begain looking at our own Wikibase instance.

5 of 18

Linked Jazz Data Management History

“You are here”

Heap of files: RDF, CSV, JSON, etc

(~2012)

Relational DB

Linked Data Platform (LDP) server

6 of 18

Why are we trying Wikibase?

  • Is graph based (our data is RDF)
  • Has a good user interface for editing
  • Has straightforward API
  • Can model data provenance at the statement level
  • A feeling that we can better interact with Wikidata if we are aligned on the same platform.
  • Always adding new facets of our project, new data, need to easily be able to introduce new models

7 of 18

Wikibase: Infrastructure

We are testing two instances of wikibase:

http://base.semlab.io/ (4 vCPUs 8GB / 160GB Disk / Digital Ocean)

http://sandbase.semlab.io/ (2 vCPUs 4GB / 80GB Disk / Digital Ocean)

8 of 18

Wikibase: Infrastructure

Using the Docker image to run our instances.

Our fork: https://github.com/SemanticLab/wikibase-docker

For now our only modifications are using the build script (not just pulling) to build the images, adding files to the wikibase image and adding additional configuration to LocalSettings.php

Has been a very smooth experience once you get over Docker learning curve.

9 of 18

Wikibase: Bootstrapping Data

We already have lots of data we want to load into our instance.

First step was preparing legacy data for ingest: https://github.com/linkedjazz/lj_database_cleanup

10 of 18

Transcript

Entity

Individual Statement

Host Institution

Entity

11 of 18

Wikibase: Bootstrapping Data

First attempt: https://github.com/SemanticLab/data-2-wikibase

  • Using the base API to add properties via python module: pywikibot
  • Using python module wikidataintegrator to add items..

Next attempt:

  • Using pywikibot to build properties (or manually)
  • Try https://github.com/maxlath/wikidata-edit to create items, claims, references.

12 of 18

13 of 18

Using References to track provenance at the statement leve

14 of 18

15 of 18

Working around field limitations

16 of 18

Thinking about seralizations based on context and use case

17 of 18

Next Steps

  • Complete data load and modeling
  • Ingest data for other facets of our project for example New Orleans Musician Union List
  • Build workflows to use wikibase to support our processes:
    • Serving data to power our existing content negotiable URIs
    • Serialize out datasets using existing vocabularies
    • Power our crowdsourcing tool

18 of 18

Thanks!