1 of 53

A new (editing) frontend for VIVO

Hector Correa / hector_correa@brown.edu

Steven McCauley / steven_mccauley@brown.edu

Brown University Library

September/2019 - Podgorica, Montenegro

2 of 53

VIVO at Brown

Background

  • In production since 2014:
    • VIVO for profile display and browsing
    • Custom editor interface as standalone application
    • 4000+ researchers
  • In 2017 we released a new read-only frontend (slides)
  • In 2018 we added visualizations to new frontend (slides)
  • In 2019
    • More visualizations and reports (CSV, Excel)
  • New editing frontend (in progress)

3 of 53

VIVO at Brown: 2014

Manager

(Django App)

Data Service

(Flask App)

general public

editors (faculty)

developers

VIVO

Fuseki

Solr

Java App

4 of 53

VIVO at Brown: 2017

Manager

(Django App)

Data Service

(Flask App)

general public

editors (faculty)

developers

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

5 of 53

Demo of current setup

(two apps)

6 of 53

Current setup (two apps)

left: viewing frontend right: current editor frontend

7 of 53

left: viewing frontend right: current editor frontend

Current setup (two apps)

8 of 53

left: viewing frontend right: current editor frontend

Current setup (two apps)

9 of 53

Problems with current setup (frontend)

  • Mismatch between profile display and profile editing
    • Two different user experiences
    • Two different codebases
  • Django application is difficult to manage
    • Misuse of MVC framework
    • Multiple points of interaction with external systems: VIVO triple store, external APIs

10 of 53

VIVO at Brown: 2019 (later this year)

Data Service

(Flask App)

general public

editors (faculty)

developers

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

Manager

(Django App)

11 of 53

Demo of new setup

(one app)

12 of 53

New setup (one app)

left: viewing frontend right: new editor frontend

13 of 53

New setup (one app)

left: viewing frontend right: new editor frontend

14 of 53

New setup (one app)

left: viewing frontend right: new editor frontend

15 of 53

Goals (frontend)

  • Better user experience
    • A single frontend for viewing and editing makes sense
    • Benefits for us as developers but more importantly to users
  • Easier updates
    • Decoupling user interface, CRUD operations, and other services

16 of 53

General flow when viewing data

https://vivo.brown.edu/search

https://vivo.brown.edu/display/user-id

Data Service

(Flask App)

user

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

17 of 53

General flow when editing data

https://vivo.brown.edu/edit/user-id

Data Service

(Flask App)

user

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

REST

SPARQL

18 of 53

Backend

19 of 53

Backend: current state

Problems

  • Multiple points of interaction with triple store
  • Custom scripts for RDF generation and SPARQL queries
  • Hard-to-debug data anomalies: multiple rdfs:labels, typed and untyped data, unexpected inferencing

Goals

  • Single point of interaction: web API for data updates
  • High-level API for scripting boilerplate RDF/SPARQL

20 of 53

Tools for scripting RDF (and other graphs)

Prior Work: a brief sample

Desired function: transactional edits on individual resources

  • Common pattern in MVC web frameworks: the Model and ORM

21 of 53

Object-Relational Mapper: Model + CRUD workflow

from app import models, db

@route('/edit/<userID>/overview')

def update_overview(userID):

faculty = models.Faculty.query \

.filter_by(id=userID).first()

faculty.overview = "My new overview"

db.session.add(faculty)

db.session.commit()

return {'overview': faculty.overview}

*Based on general pattern using Flask-SQLAlchemy: https://flask-sqlalchemy.palletsprojects.com/en/2.x/quickstart/

from app import db

class Faculty(Model):

__table__ = 'faculty'

id = db.Column(db.Integer, key=True)

overview = db.Column(db.String)

statement = db.Column(db.String, length=500)

research_areas = db.Column(db.Integer,

relationship='ResearchArea')

weblinks = db.Column(db.Integer,

relationship='WebLink')

id

overview

statement

research_area

weblinks

1

"My old overview"

"My statement"

17

13

TABLE 'faculty'

22 of 53

Designing an ORM for RDF ("R" is for "Resource")

  • Lack of ORM-style interface for RDF triplestores is both a source of bugs, and an impediment to adoption and development
  • RDF and SPARQL operations map neatly onto basic CRUD operations
  • Schemaless, stateless design of RDF triplestores eliminates much of the complexity contained in ORM
  • Implementing the basic functions of an ORM for an RDF triplestore is an achievable goal, and would be a great benefit to RDF development

23 of 53

RDF and ORM workflow: transactions and commits

def update_overview(userID):

db.session.commit()

  • SPARQL request is platform-agnostic: no need for middleware to communicate with storage layer
  • HTTP request establishes transaction scope
  • Scripting INSERT/DELETE graph only requires knowing which triples to put where (and associated named graphs)
  • Using HTTP for transactions carries performance questions, especially for bulk updates
  • https://wiki.duraspace.org/display/VIVODOC19x/SPARQL+Update+API

msg = '''

PREFIX

DELETE DATA {

GRAPH {

}

}

INSERT DATA {

GRAPH {

}

}

'''

http.post('http://localhost:8080/vivo/api/sparqlUpdate', data={'update': msg})

24 of 53

RDF and ORM workflow: transactions and commits

def update_overview(userID):

db.session.add(faculty)

db.session.commit()

  • SPARQL request is platform-agnostic: no need for middleware to communicate with storage layer
  • HTTP request establishes transaction scope
  • Scripting INSERT/DELETE graph only requires knowing which triples to put where (and associated named graphs)
  • Using HTTP for transactions carries performance questions, especially for bulk updates
  • https://wiki.duraspace.org/display/VIVODOC19x/SPARQL+Update+API

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

DELETE DATA {

GRAPH <http://vivo.brown.edu/data> {

<http://vivo.brown.edu/individual/steve> brown:overview "My old overview"^^xsd:string .

}

}

INSERT DATA {

GRAPH <http://vivo.brown.edu/data> {

<http://vivo.brown.edu/individual/steve> brown:overview "My new overview"^^xsd:string .

}

}

'''

http.POST('http://localhost:8080/vivo/api/sparqlUpdate', data={'update': msg})

25 of 53

RDF and ORM workflow: Resource definition

  • Mapping graph data to dictionary- or object-like structures is pretty standard: https://www.python.org/doc/essays/graphs/
  • Relational DB table row corresponds to subset of triplestore : set of triples with shared subject URI
  • Row ID is Subject, Columns are Properties, and Column values are Objects

from app import db

class Faculty(Model):

__table__ = 'faculty'

id = db.Column(db.Integer, key=True)

overview = db.Column(db.String)

statement = db.Column(db.String, length=500)

research_areas = db.Column(db.Integer,

relationship='ResearchArea')

weblinks = db.Column(db.Integer,

relationship='WebLink')

from app import db

class Faculty(Resource):

graph = 'http://vivo.brown.edu/data'

id = db.URI

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks'...

<http://.../steve>

brown:overview

"My old overview"

<http://.../steve>

brown:statement

"My statement"

<http://.../steve>

...

...

id

overview

statement

...

1

"My old overview"

"My statement"

...

26 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

OPTIONAL { ?uri brown:overview ?overview .}

OPTIONAL { …

}

OPTIONAL clauses

  • In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
  • RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
  • Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

27 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

OPTIONAL { ?uri brown:overview ?overview .}

OPTIONAL { …

}

OPTIONAL clauses

  • In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
  • RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
  • Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

28 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

} UNION {

?uri rdf:type brown:Faculty

?uri brown:overview ?overview .

} UNION { ...

}

UNION clauses

  • In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
  • RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
  • Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

29 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

} UNION {

?uri rdf:type brown:Faculty

?uri brown:overview ?overview .

} UNION { ...

}

UNION clauses

  • In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
  • RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
  • Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

30 of 53

Resource loading: dynamic attributes

def update_overview(userID):

faculty = models.Faculty.query...

from app import db

class Faculty(Resource):

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = "DESCRIBE <http://.../steve>"

data = http.post(

'http://localhost:8080/vivo/api/sparqlQuery',

data={'query': msg})

<http://.../steve>

brown:overview

...

<http://.../steve>

obo:71003401

...

<http://.../steve>

brown:statement

...

<http://.../steve>

obo:04300002

...

  • DESCRIBE query returns full subset of triples with shared subject URI
  • Attributes of Resource object act as filters on subset, returning only those triples with mapped properties
  • DESCRIBE is performant, complete, and makes no assumptions about state of triples

31 of 53

Resource loading: dynamic attributes

def update_overview(userID):

faculty = models.Faculty.query...

from app import db

class Faculty(Resource):

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = "DESCRIBE <http://.../steve>"

data = http.post(

'http://localhost:8080/vivo/api/sparqlQuery',

data={'query': msg})

faculty = models.Faculty.load(data)

<http://.../steve>

brown:overview

...

<http://.../steve>

obo:71003401

...

<http://.../steve>

brown:statement

...

<http://.../steve>

obo:04300002

...

  • DESCRIBE query returns full subset of triples with shared subject URI
  • Attributes of Resource object act as filters on subset, returning only those triples with mapped properties
  • DESCRIBE is performant, complete, and makes no assumptions about state of triples

32 of 53

RDF and ORM workflow: the rest

faculty = models.Faculty.query.filter_by(id='steve').first()

faculty.overview = "My new overview"

faculty = models.Faculty.query \

.filter_by(id='steve').first()

faculty.overview = "My new overview"

db.session.add(faculty)

db.session.commit()

msg = '''

DESCRIBE ?uri

WHERE {

?uri rdf:type brown:Faculty.

?uri brown:id "steve"^^xsd:string .

}

'''

Query filtering

remove = [(<http://vivo.brown.edu/individual/steve>,<brown:overview>,"\\"My old overview\\"^^xsd:string")]

add = [(<http://vivo.brown.edu/individual/steve>,<brown:overview>,"\\"My new overview\\"^^xsd:string")]

Attribute update

33 of 53

Designing an ORM for RDF ("R" is for "Resource")

  • We can build an interface to triplestores that encourages consistency while maintaining the flexibility of RDF
  • Basic functions not covered here: resource creation, deletion, chaining, listing
  • Advanced functions: resource loading strategies, ontology/model reflection, complex queries, triplestore middleware
  • Current goals: build off prior work, see basic functions through to completion before getting sidetracked on more advanced features

34 of 53

In closing...

35 of 53

Takeaways

  • We are using VIVO
    • To store the data
    • To leverage semantic web, linked data, yada-yada-yada
    • Ontology management and other admin functions via the native VIVO app
  • Lots of code outside of VIVO
    • Ruby and JavaScript (frontend)
    • Python and SPARQL (backend)
  • Keep components separated
    • Display, triplestore access, searching, graph caching
  • Fast release cycles
    • Frontend can be deployed at will
    • Backend can be updated at will (as long as the API does not change)

36 of 53

Thanks!

37 of 53

[the end]

38 of 53

[backup slides]

39 of 53

Future work

  • Renovating newly uncoupled services
    • Publication manager, administrative interface, report generation, email service...
    • API interaction: autocomplete, PubMed, FAST, etc.
  • Data migration
    • Cleaning up untyped and otherwise malformed data
    • All triples in named graphs
  • Automated processes
    • Bulk updates
  • Improving performance
    • Caching, minimizing network calls

40 of 53

Queries: current state

Issues

  • Repeated code
  • Query is bound to Django View: can’t isolate, validate, debug query results

41 of 53

Queries: current state

Issues

  • OPTIONAL statements
  • Returning multiple object classes from same query
  • Data processing in query
    • sorting, date formatting

42 of 53

Queries: current state

Issues

  • UNION statements
  • Typed/untyped data
  • Data handling in query

43 of 53

Data editing: current state

Issues

  • Manual generation of URI
  • Inverse properties: operating on multiple objects

44 of 53

Data editing: current state

Issues

  • Manual generation of URI
  • Inverse properties: relying on inferencing

45 of 53

Data ecosystem: current state

46 of 53

Resource-oriented SPARQL

  • DESCRIBE queries leverage built-in indexes for efficiency
  • Post-processing of resource attributes
  • https://www.w3.org/TR/rdf-sparql-query/#describe

47 of 53

Resource-oriented SPARQL

48 of 53

Object-Relational Mapper: CRUD workflow

from app import models, db

def update_overview(facultyID):

data = request.json()

faculty = models.Faculty.query.filter_by(id=facultyID).first()

faculty.overview = data.get(‘overview’)

db.session.add(faculty)

db.session.commit()

return {‘overview’: faculty.overview}

49 of 53

Problems with current setup (backend)

  • Django application is difficult to manage
    • Misuse of MVC framework
    • Multiple points of interaction with external systems: VIVO triple store, external APIs
    • Tightly bound with other services: publication harvesting, administrative interface and reports
  • Interaction with triple store is unsatisfactory
    • Burrows into VIVO’s built-in “primitiveRdfEdit” endpoint, which is designed for built-in forms
  • Uncontrolled RDF/SPARQL
    • Mostly produced by hand: irregular and hard to debug
    • Scattered throughout codebase

50 of 53

Visualizations

  • General Workflow
    • See slides 2018
  • Researcher level samples
  • Organization level samples
  • Team level samples
    • (work in progress)

51 of 53

Reports

  • We love linked data, but users love spreadsheets
  • Created reports for users that output to Excel
  • Single Excel file with all the information for a group of researchers

52 of 53

RDF and SPARQL: enforcing consistency

Hand-coding leads to data anomalies and other development headaches

  • Multiple values for rdfs:label
  • Untyped data and typed data, often for the same property
  • Inconsistent inverse properties and inferencing
  • Boilerplate queries with subtle variations

RDF development is hampered by the lack of standard tools

  • Other DB options offer APIs and libraries for interacting with storage layer
  • ORMs hide

53 of 53

VIVO at Brown

Manager

(Django App)

Data Service

(Flask App)

general public

editors (faculty)

developers

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App