1 of 53

A new (editing) frontend for VIVO

Hector Correa / hector_correa@brown.edu

Steven McCauley / steven_mccauley@brown.edu

Brown University Library

September/2019 - Podgorica, Montenegro

2 of 53

VIVO at Brown

Background

In production since 2014:

VIVO for profile display and browsing
Custom editor interface as standalone application
4000+ researchers

In 2017 we released a new read-only frontend (slides)
In 2018 we added visualizations to new frontend (slides)
In 2019

More visualizations and reports (CSV, Excel)

New editing frontend (in progress)

3 of 53

VIVO at Brown: 2014

Manager

(Django App)

Data Service

(Flask App)

general public

editors (faculty)

developers

VIVO

Fuseki

Solr

Java App

4 of 53

VIVO at Brown: 2017

Manager

(Django App)

Data Service

(Flask App)

general public

editors (faculty)

developers

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

5 of 53

Demo of current setup

(two apps)

6 of 53

Current setup (two apps)

left: viewing frontend right: current editor frontend

7 of 53

left: viewing frontend right: current editor frontend

Current setup (two apps)

8 of 53

left: viewing frontend right: current editor frontend

Current setup (two apps)

9 of 53

Problems with current setup (frontend)

Mismatch between profile display and profile editing

Two different user experiences
Two different codebases

Django application is difficult to manage

Misuse of MVC framework
Multiple points of interaction with external systems: VIVO triple store, external APIs

10 of 53

VIVO at Brown: 2019 (later this year)

Data Service

(Flask App)

general public

editors (faculty)

developers

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

Manager

(Django App)

11 of 53

Demo of new setup

(one app)

12 of 53

New setup (one app)

left: viewing frontend right: new editor frontend

13 of 53

New setup (one app)

left: viewing frontend right: new editor frontend

14 of 53

New setup (one app)

left: viewing frontend right: new editor frontend

15 of 53

Goals (frontend)

Better user experience

A single frontend for viewing and editing makes sense
Benefits for us as developers but more importantly to users

Easier updates

Decoupling user interface, CRUD operations, and other services

16 of 53

General flow when viewing data

https://vivo.brown.edu/search

https://vivo.brown.edu/display/user-id

Data Service

(Flask App)

user

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

17 of 53

General flow when editing data

https://vivo.brown.edu/edit/user-id

Data Service

(Flask App)

user

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App

REST

SPARQL

18 of 53

Backend

19 of 53

Backend: current state

Problems

Multiple points of interaction with triple store
Custom scripts for RDF generation and SPARQL queries
Hard-to-debug data anomalies: multiple rdfs:labels, typed and untyped data, unexpected inferencing

Goals

Single point of interaction: web API for data updates
High-level API for scripting boilerplate RDF/SPARQL

20 of 53

Tools for scripting RDF (and other graphs)

Prior Work: a brief sample

VIVO community: Harvester, VIVO Pump

rdflib (and Resource module)

SuRF

https://pythonhosted.org/SuRF/

Neo4j-OGM

https://neo4j.com/docs/ogm-manual/current/

Desired function: transactional edits on individual resources

Common pattern in MVC web frameworks: the Model and ORM

21 of 53

Object-Relational Mapper: Model + CRUD workflow

from app import models, db

@route('/edit/<userID>/overview')

def update_overview(userID):

faculty = models.Faculty.query \

.filter_by(id=userID).first()

faculty.overview = "My new overview"

db.session.add(faculty)

db.session.commit()

return {'overview': faculty.overview}

*Based on general pattern using Flask-SQLAlchemy: https://flask-sqlalchemy.palletsprojects.com/en/2.x/quickstart/

from app import db

class Faculty(Model):

__table__ = 'faculty'

id = db.Column(db.Integer, key=True)

overview = db.Column(db.String)

statement = db.Column(db.String, length=500)

research_areas = db.Column(db.Integer,

relationship='ResearchArea')

weblinks = db.Column(db.Integer,

relationship='WebLink')

id	overview	statement	research_area	weblinks
1	"My old overview"	"My statement"	17	13

TABLE 'faculty'

22 of 53

Designing an ORM for RDF ("R" is for "Resource")

Lack of ORM-style interface for RDF triplestores is both a source of bugs, and an impediment to adoption and development
RDF and SPARQL operations map neatly onto basic CRUD operations
Schemaless, stateless design of RDF triplestores eliminates much of the complexity contained in ORM
Implementing the basic functions of an ORM for an RDF triplestore is an achievable goal, and would be a great benefit to RDF development

23 of 53

RDF and ORM workflow: transactions and commits

def update_overview(userID):

…

db.session.commit()

SPARQL request is platform-agnostic: no need for middleware to communicate with storage layer
HTTP request establishes transaction scope
Scripting INSERT/DELETE graph only requires knowing which triples to put where (and associated named graphs)
Using HTTP for transactions carries performance questions, especially for bulk updates
https://wiki.duraspace.org/display/VIVODOC19x/SPARQL+Update+API

msg = '''

PREFIX

DELETE DATA {

GRAPH {

}

INSERT DATA {

GRAPH {

}

'''

http.post('http://localhost:8080/vivo/api/sparqlUpdate', data={'update': msg})

24 of 53

RDF and ORM workflow: transactions and commits

def update_overview(userID):

…

db.session.add(faculty)

db.session.commit()

SPARQL request is platform-agnostic: no need for middleware to communicate with storage layer
HTTP request establishes transaction scope
Scripting INSERT/DELETE graph only requires knowing which triples to put where (and associated named graphs)
Using HTTP for transactions carries performance questions, especially for bulk updates
https://wiki.duraspace.org/display/VIVODOC19x/SPARQL+Update+API

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

DELETE DATA {

GRAPH <http://vivo.brown.edu/data> {

<http://vivo.brown.edu/individual/steve> brown:overview "My old overview"^^xsd:string .

}

INSERT DATA {

GRAPH <http://vivo.brown.edu/data> {

<http://vivo.brown.edu/individual/steve> brown:overview "My new overview"^^xsd:string .

}

'''

http.POST('http://localhost:8080/vivo/api/sparqlUpdate', data={'update': msg})

25 of 53

RDF and ORM workflow: Resource definition

Mapping graph data to dictionary- or object-like structures is pretty standard: https://www.python.org/doc/essays/graphs/
Relational DB table row corresponds to subset of triplestore : set of triples with shared subject URI
Row ID is Subject, Columns are Properties, and Column values are Objects

from app import db

class Faculty(Model):

__table__ = 'faculty'

id = db.Column(db.Integer, key=True)

overview = db.Column(db.String)

statement = db.Column(db.String, length=500)

research_areas = db.Column(db.Integer,

relationship='ResearchArea')

weblinks = db.Column(db.Integer,

relationship='WebLink')

from app import db

class Faculty(Resource):

graph = 'http://vivo.brown.edu/data'

id = db.URI

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks'...

<http://.../steve>	brown:overview	"My old overview"
<http://.../steve>	brown:statement	"My statement"
<http://.../steve>	...	...

id	overview	statement	...
1	"My old overview"	"My statement"	...

26 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

…

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

OPTIONAL { ?uri brown:overview ?overview .}

OPTIONAL { …

}

OPTIONAL clauses

In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

27 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

…

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

OPTIONAL { ?uri brown:overview ?overview .}

OPTIONAL { …

}

OPTIONAL clauses

In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

28 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

…

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

} UNION {

?uri rdf:type brown:Faculty

?uri brown:overview ?overview .

} UNION { ...

}

UNION clauses

In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

29 of 53

Resource loading: dynamic attributes

from app import db

class Faculty(Resource):

…

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = '''

PREFIX brown <http://vivo.brown.edu/profile/>

CONSTRUCT {

?uri brown:overview ?overview .

?uri …

}

WHERE {

?uri rdf:type brown:Faculty .

} UNION {

?uri rdf:type brown:Faculty

?uri brown:overview ?overview .

} UNION { ...

}

UNION clauses

In triplestore, there are no placeholders or empty Column cells: triples either exist, or do not exist
RDF is extremely flexible, but unpredictable: no way to know ahead of time what triple statements exist for a particular Resource
Previously handled using OPTIONAL and UNION clauses, which are functional but suboptimal

30 of 53

Resource loading: dynamic attributes

def update_overview(userID):

faculty = models.Faculty.query...

from app import db

class Faculty(Resource):

…

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = "DESCRIBE <http://.../steve>"

data = http.post(

'http://localhost:8080/vivo/api/sparqlQuery',

data={'query': msg})

<http://.../steve>	brown:overview	...
<http://.../steve>	obo:71003401	...
<http://.../steve>	brown:statement	...
<http://.../steve>	obo:04300002	...

DESCRIBE query returns full subset of triples with shared subject URI
Attributes of Resource object act as filters on subset, returning only those triples with mapped properties
DESCRIBE is performant, complete, and makes no assumptions about state of triples

31 of 53

Resource loading: dynamic attributes

def update_overview(userID):

faculty = models.Faculty.query...

from app import db

class Faculty(Resource):

…

overview = db.Property(uri='brown:overview',

datatype=db.String)

statement = db.Property(uri='brown:statement',

datatype=db.String, length=500)

research_areas = db.Property(uri='brown:topics',

relationship='ResearchArea')

weblinks = db.Property(uri='brown:weblinks',

relationship='WebLink')

msg = "DESCRIBE <http://.../steve>"

data = http.post(

'http://localhost:8080/vivo/api/sparqlQuery',

data={'query': msg})

faculty = models.Faculty.load(data)

<http://.../steve>	brown:overview	...
<http://.../steve>	obo:71003401	...
<http://.../steve>	brown:statement	...
<http://.../steve>	obo:04300002	...

DESCRIBE query returns full subset of triples with shared subject URI
Attributes of Resource object act as filters on subset, returning only those triples with mapped properties
DESCRIBE is performant, complete, and makes no assumptions about state of triples

32 of 53

RDF and ORM workflow: the rest

faculty = models.Faculty.query.filter_by(id='steve').first()

faculty.overview = "My new overview"

faculty = models.Faculty.query \

.filter_by(id='steve').first()

faculty.overview = "My new overview"

db.session.add(faculty)

db.session.commit()

msg = '''

DESCRIBE ?uri

WHERE {

?uri rdf:type brown:Faculty.

?uri brown:id "steve"^^xsd:string .

}

'''

Query filtering

remove = [(<http://vivo.brown.edu/individual/steve>,<brown:overview>,"\\"My old overview\\"^^xsd:string")]

add = [(<http://vivo.brown.edu/individual/steve>,<brown:overview>,"\\"My new overview\\"^^xsd:string")]

Attribute update

33 of 53

Designing an ORM for RDF ("R" is for "Resource")

We can build an interface to triplestores that encourages consistency while maintaining the flexibility of RDF
Basic functions not covered here: resource creation, deletion, chaining, listing
Advanced functions: resource loading strategies, ontology/model reflection, complex queries, triplestore middleware
Current goals: build off prior work, see basic functions through to completion before getting sidetracked on more advanced features

34 of 53

In closing...

35 of 53

Takeaways

We are using VIVO

To store the data
To leverage semantic web, linked data, yada-yada-yada
Ontology management and other admin functions via the native VIVO app

Lots of code outside of VIVO

Ruby and JavaScript (frontend)
Python and SPARQL (backend)

Keep components separated

Display, triplestore access, searching, graph caching

Fast release cycles

Frontend can be deployed at will
Backend can be updated at will (as long as the API does not change)

36 of 53

Thanks!

Live site: https://vivo.brown.edu/ (read-only version)
Source code

Frontend: https://github.com/Brown-University-Library/vivo-on-rails
Backend: https://github.com/Brown-University-Library/rab-trax

Slides: https://tinyurl.com/vivo-2019-brown

37 of 53

[the end]

38 of 53

[backup slides]

39 of 53

Future work

Renovating newly uncoupled services

Publication manager, administrative interface, report generation, email service...
API interaction: autocomplete, PubMed, FAST, etc.

Data migration

Cleaning up untyped and otherwise malformed data
All triples in named graphs

Automated processes

Bulk updates

Improving performance

Caching, minimizing network calls

40 of 53

Queries: current state

Issues

Repeated code
Query is bound to Django View: can’t isolate, validate, debug query results

41 of 53

Queries: current state

Issues

OPTIONAL statements
Returning multiple object classes from same query
Data processing in query

sorting, date formatting

42 of 53

Queries: current state

Issues

UNION statements
Typed/untyped data
Data handling in query

43 of 53

Data editing: current state

Issues

Manual generation of URI
Inverse properties: operating on multiple objects

44 of 53

Data editing: current state

Issues

Manual generation of URI
Inverse properties: relying on inferencing

45 of 53

Data ecosystem: current state

46 of 53

Resource-oriented SPARQL

DESCRIBE queries leverage built-in indexes for efficiency
Post-processing of resource attributes
https://www.w3.org/TR/rdf-sparql-query/#describe

47 of 53

Resource-oriented SPARQL

48 of 53

Object-Relational Mapper: CRUD workflow

from app import models, db

def update_overview(facultyID):

data = request.json()

faculty = models.Faculty.query.filter_by(id=facultyID).first()

faculty.overview = data.get(‘overview’)

db.session.add(faculty)

db.session.commit()

return {‘overview’: faculty.overview}

49 of 53

Problems with current setup (backend)

Django application is difficult to manage

Misuse of MVC framework
Multiple points of interaction with external systems: VIVO triple store, external APIs
Tightly bound with other services: publication harvesting, administrative interface and reports

Interaction with triple store is unsatisfactory

Burrows into VIVO’s built-in “primitiveRdfEdit” endpoint, which is designed for built-in forms

Uncontrolled RDF/SPARQL

Mostly produced by hand: irregular and hard to debug
Scattered throughout codebase

50 of 53

Visualizations

General Workflow

See slides 2018

Researcher level samples

Organization level samples

Team level samples

(work in progress)

51 of 53

Reports

We love linked data, but users love spreadsheets
Created reports for users that output to Excel
Single Excel file with all the information for a group of researchers

52 of 53

RDF and SPARQL: enforcing consistency

Hand-coding leads to data anomalies and other development headaches

Multiple values for rdfs:label
Untyped data and typed data, often for the same property
Inconsistent inverse properties and inferencing
Boilerplate queries with subtle variations

RDF development is hampered by the lack of standard tools

Other DB options offer APIs and libraries for interacting with storage layer
ORMs hide

53 of 53

VIVO at Brown

Manager

(Django App)

Data Service

(Flask App)

general public

editors (faculty)

developers

Front End

(Rails App)

Solr (replica)

VIVO

Fuseki

Solr

Java App