Data Visualization with GraphDB and Workbench
Contents
1.1 Built-in SPARQL Result Visualizations
1.2 Using SPARQL Results in Spreadsheets
2.1 Difference Between Interactive and Programmatic Endpoint
3 Help With Writing SPARQL Queries
3.7 AKSW Neural SPARQL Machines
3.13 OpenLink iSPARQL Query Builder
5.1 Builtin Overview Visualizations
5.2 Builtin Graph Visualizations
5.3 Developing Graph Visualizations
The way you get data out of a semantic repository is with SPARQL. Thanks to integrating the excellent YASGUI editor, GDB offers 3 very useful kinds of auto-completion:
Through Yasgui integration, GraphDB Workbench can show charts from SPARQL results using Google Charts and pivots. For example, let's try query F4: Top-level industries by number of companies on http://factforge.net:
After invoking it, go to Google Chart> Chart Config and select an appropriate one (bar, line, pie, etc). There are also Pivot Tables and charts that allow you to analyze 2- and more- dimensional result sets.
You can also download results in a number of formats (TSV and CSV are most useful) for analysis in other programs, e.g. Google Sheets or Excel.
Two more examples from Getty Vocabularies sample queries (where they are made with external tools, but we recreated them in Workbench):
The Google Sheet FactForge-Industries imports the data of the above query, and then makes a similar chart.
=IMPORTDATA("http://factforge.net/repositories/ff-news?query=%23+F4%3A+Top-level+industries+by+number+of+companies%0A%23+-+benefits+from+the+mapping+and+consolidation+of+industry+classifications%0A%23+++and+predicates+in+DBPedia+done+in+the+FactForge%0A%23+-+benefits+from+reasoning+-+transitive+and+symmetric+properties+across%0A%23+++the+industry+classification+taxonomy+of+FactForge%0A%0APREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0APREFIX+ff-map%3A+%3Chttp%3A%2F%2Ffactforge.net%2Fff2016-mapping%2F%3E%0A%0ASELECT+DISTINCT+%3Ftop_industry+(COUNT(*)+AS+%3Fcount)%0A%7B%0A+++%3Fcompany+dbo%3Aindustry+%3Findustry+.%0A+++%3Findustry+%5Eff-map%3AindustryVariant+%2F+ff-map%3AindustryCenter+%3Ftop_industry+.%0A%7D%0AGROUP+BY+%3Ftop_industry+ORDER+BY+DESC(%3Fcount)+")
=regexreplace(A2,"http://dbpedia.org/resource/","")
Not everyone knows SPARQL, so it is important to be able to give queries to other people so they can invoke them, or use a query link to invoke the query from another program. To this end, you need to understand several things as described below.
GDB WB includes an interactive endpoint where one can enter and edit queries (e.g. http://factforge.net/sparql). But for programmatic access to query results, one should find the "straight" endpoint. To find it, see Help>REST API. First find the repo name: http://factforge.net/rest/repositories returns the list of repos as JSON:
curl http://factforge.net/rest/repositories | jq .
## there's another called SYSTEM ...
"id": "ff-news",
"title": "ff-news",
"uri": "http://factforge.net/repositories/ff-news",
The second part of the REST API shows the SPARQL endpoint, e.g. in this case http://factforge.net/repositories/ff-news
If your query has some variable parts, you want to be able to bind these parameters with some values at the time you invoke the query. This is explained in the rdf4j documentation (same as the older Sesame documentation). Jena has a similar feature, and the Data Incubator Linked Data Patterns book explains the concept. By convention, parameters are preceded by "$", as opposed to query parameters which are preceded by "?". For example, here's a query to return industries of a given $company from FactForge's copy of DBpedia:
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?industry {$company dbo:industry ?industry}
You can bind a value to the query parameter by including a corresponding HTTP request parameter, see below. You need to format the value in NTriples format (unfortunately you cannot use prefixes), e.g.
After editing a query until you are satisfied with the results, you can get a link to invoke the query from GDB WB:
To invoke a query, do HTTP GET on the link constructed above. By default the result format is CSV (text/csv).
For example, invoking the query above with parameter dbr:Google and returning TSV:
curl -H Accept:text/tab-separated-values "http://factforge.net/repositories/ff-news?infer=true&sameAs=false&query=PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0ASELECT+%3Findustry+%7B%24company+dbo%3Aindustry+%3Findustry%7D%0A&$company=<http://dbpedia.org/resource/Google>"
?industry
<http://dbpedia.org/resource/Software>
<http://dbpedia.org/resource/Internet>
<http://dbpedia.org/resource/Mobile_device>
<http://dbpedia.org/resource/Cloud_computing>
There are various tools that can help you write SPARQL queries. We'd be happy to help you deploy one of these tools on a GDB SPARQL endpoint. Many of these are research prototypes, and some of them are based on Controlled Natural Language (CNL).
SQUALL (Semantic Query and Update High-Level Language) is a project for querying RDF data in CNL by Sebastien.Ferre@irisa.fr, head of the Sem LIS research team at IRISA, University Rennes 1. It uses Montague grammars and claims "SQUALL has a strong adequacy with RDF, and covers all SPARQL 1.0 constructs, and many of SPARQL 1.1. Its syntax completely abstracts from low-level notions such as bindings and relational algebra. It features disjunction, negation, quantifiers, built-in predicates, aggregations with grouping, and n-ary relations through reification."
See publications [1] [2] [3], examples. E.g. the question
Which person is an author of at least 10 publication-s?
translates to:
SELECT DISTINCT ?x1 WHERE {
?x1 a :person .
{SELECT DISTINCT ?x1 (COUNT(DISTINCT ?x3) AS ?x2) WHERE {
?x3 a :publication .
?x3 :author ?x1 .
} GROUP BY ?x1}
FILTER (?x2 >= 10)
}
We tried it on the Getty Vocabularies LOD with a question
Which subject-s have at least 3 prefLabel-s?
SQUALL came up with the following query that is quite adequate:
SELECT DISTINCT ?x1 WHERE {
?x1 a :Subject .
{SELECT DISTINCT ?x1 (COUNT(DISTINCT ?x3) AS ?x2) WHERE
{?x1 :prefLabel ?x3}
GROUP BY ?x1}
FILTER (?x2 >= 3)}
Sparklis is a project by the same group at IRISA. It reconciles expressivity and usability in semantic search by tightly combining a Query Builder, a Natural Language Interface, and a Faceted Search system.
It is an excellent UI that guides a user in writing a query in English. It's incrementally converted to SPARQL, and partial results are shown in facets to provide better guidance. I've explored several CNL for SPARQL, but this one is the most practical and usable. See Youtube video, Demo, Examples, paper [4]
Original announcement Tue 4/11/2017 by S. Ferré on the dbpedia-discuss mailing list:
"As a researcher in the Semantic Web, I developed in the past 3 years Sparklis, a tool to allow people to explore and query SPARQL endpoints without any knowledge of SPARQL. It combines the following features:
DBpedia is already accessible through Sparklis at this URL. It may be valuable to propose Sparklis as a query builder in the "Online access" section of the DBpedia website. To the best of my knowledge, there is no equivalent tool. By the way, I am very interested by any feedback on the tool. I am also interested to collaborate on any needed improvement.
Sparklis has recently been adopted by Persée where it is used by researchers in social sciences, and Sparklis has then received more than 1000 hits per month.
Below you see a reading of the query in English, and the results as table and shown on a map.
http://viziquer.lumii.lv/: Graphically construct and execute rich data analysis queries over RDF data . Development of this tool started in 2008 and continues until today.
Includes examples of queries over complex schemas:
Papers (newest first):
Related software:
For example, here is the hospital data schema of the hospital queries example shown above:
Source code:
This blog describes a system that uses NLP techniques to parse questions and converts them to SPARQL queries for Wikidata. See demo, source. For example, the question
"Who is Obama's wife?"
is parsed as
(SBARQ
(WHNP (WP Who))
(SQ (VBZ is) (NP (NP (NNP Obama) (POS 's)) (NN wife)))
(. ?))
which is then simplified to this "typical question":
"prop":"wife",
"qtype":"who",
"subject":"obama"
and is then translated to this query:
SELECT ?valLabel ?type
WHERE {{
wd:Q76 p:P26 ?prop .
?prop ps:P26 ?val .
OPTIONAL {
?prop psv:P26 ?propVal .
?propVal rdf:type ?type .
}}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}}
The University of Hokudai has developed a tool that can generate SPARQL for a specific class of queries (eg "Songs written by Paul McCartney") and check the results against Wikipedia Categories.
Eg http://wnews.ist.hokudai.ac.jp/wc3?search=Songs+written+by+Paul+McCartney makes a query like this:
?s rdf:type <http://dbpedia.org/ontology/MusicalWork> .
?s <http://dbpedia.org/ontology/writer> <http://dbpedia.org/resource/Paul_McCartney>.
that returns 128 songs. The quality of this resultset is estimated as Precision = 0.93, Recall = 0.92 based on articles listed in Category:Songs_written_by_Paul_McCartney.
AutoSPARQL converts a natural language expression to a SPARQL query. See papers [5] [6].
It leverages:
Currently there is no working demo :-(
Neural SPARQL Machines (NSpM) uses "seq2seq" neural networks to translate natural language to SPARQL. That uses a lot of examples to train a neural network.
Links:
Unfortunately I cannot find a demo.
QuePy is a python framework to transform natural language questions to SPARQL. It can be customized to different kinds of questions.
For example
is translated to
SELECT DISTINCT ?x1 WHERE {
?x0 rdf:type dbo:Film.
?x0 dbp:name "Friends"@en.
?x0 dbo:releaseDate ?x1.
}
Grammatical Framework (GF) is a CNL framework that allows you to define an abstract grammar for a domain and then "surface grammars" in various kinds of languages. This allows Assisted text entry. An outstanding feature of GF is that it guides the user in writing only correct text, e.g. the fridge magnets demo allows one to enter sentences about food. By including resources for multiple languages, GF can provide multilingual translations.
The translation demo shows the inter-language correspondence of text and provides translation.
When one of the surface languages is SPARQL, this enables CNL to SPARQL translation.
Ontotext used GF in the FP7 MOLTO project: see relevant publications, in particular [10] [11] [12] [13].We translated:
Cognitum are makers of the Fluent Editor for entering and querying RDF data using CNL. SmartBI is a tool for making business-intelligence queries in CNL. See product page and youtube video.
NLI GO is a generic natural language interaction (NLI) library written in Go that provides a natural language interface to databases. It allows the user to ask questions in English, which are answered by looking up information in semantic repositories or databases.
There is a demo using DBPedia via SPARQL queries. The scope of the demo is quite limited, you can only ask the following questions:
The library is open source and uses a number of advanced techniques, so it could be extended to handle more complex questions:
ExConQuer consists of two tools:
A demo video is available on Vimeo.
Both the tool's home page and demo installation are down, so development seems to have stopped.
The iSPARQL query builder is a drag-and-drop visual interface for construction of SPARQL queries. It is deployed on DBpedia and some other data sets. It allows you to drag constructs from a toolbar and creates a respective query.
Sparnatural (https://sparnatural.eu) is a SPARQL assistant by Sparna. It looks a lot like the ResearchSpace Semantic Search that we did for the British Museum in 2012-2014 (see confluence pages) and then was taken up by Metaphacts in the Metaphactory product. See eg this demos with French National Library (BnF) data: https://sparnatural.eu/demos/demo-bnf/index.html
It has a data explorer that allows you to explore instances of interest, and make a subgraph of some of its connections.
Eg here we show the famous Окото ("the Eye") lake in the Rila mountains of Bulgaria:
Then we can generalize some of the nodes to variables, eg to find all glacial lakes in Rila. The generated query and its results are shown:
Has demos for DBpedia and Wikidata, but can access only Wikidata direct (truthy) statements, not structured statements with qualifiers and references.
Wikidata items and properties use numbered URLs and many users find this hard to work with. There is a simple query builder with auto-completion that however is still very limited, and can show results only as a table
The Wikidata Query UI has excellent auto-completion. Eg if you type the following and press control-space where indicated, Wikidata fills in the correct numeric IDs (wdt:Pnnn for properties, and wd:Qnnn for items):
?item
wdt:instance of<c-space> wd:glacial lake<c-space>;
wdt:country<c-space> wd:bulgaria<c-space>
However, one needs to know the structure of SPARQL and the structure of the Wikidata ontology to be able to write queries. Eg here is a more advanced example that shows all glacial lakes in Bulgaria (concentrated in the Rila and Pirin mountains), together with their coordinates and image, on a map view:
#defaultView:Map
SELECT DISTINCT ?item ?itemLabel ?coords ?image WHERE {
?item wdt:P17 wd:Q219.
?item wdt:P31/wdt:P279* wd:Q211302.
optional {?item wdt:P625 ?coords}
optional {?item wdt:P18 ?image}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],bg". }
}
We use the wikibase:label service to pick a label in the user's language (e.g. English), and otherwise in Bulgarian. Here is the result:
Starting from such query, there is a Query Helper that allows a user who doesn't know SPARQL to change the query:
Eg here is a map of 30 museums in Greece:
GraphDB Workbench includes some Built-in SPARQL Result Visualizations that can represent 2-dimensional data (Pivot Charts). In this section we describe more sophisticated approaches to such data.
There is a well-established ontology for representing statistical data in RDF: W3C Data CUBE. It incorporates an OLAP data model and statistical classifications following SDMX. There is a number of statistical datasets available as RDF, including:
The same link Data Cube Implementations describes a number of specialized statistical data visualization tools. Below we describe two of them, and a more powerful CubeViewer that however does not yet work with W3C Cubes.
CubeViz is an AKSW project that provides a faceted browser and visualization components.
This shot is from CubeVizJS:
All the other shots are from CubeViz - the OntoWiki component.
Below are two Polar Charts. The second shows data from the EU Digital Agenda scoreboard.
Configuring a Polar Chart:
The OpenCube Toolkit was developed by the OpenCube project. It has developed a number of related solutions for the publishing, exploration and visualization of W3C Cube datasets:
Data Creating
Data Expanding
Data Exploring
Several components (the R2RML extension, OpenCube Browser, OpenCube OLAP Browser, MapView) are tightly integrated in fluidOps IWB and it doesn't seem easy to use them outside of this environment.
A number of public showcases are listed, but there is no working demo at present.
CubesViewer is an excellent OLAP visualization tool: demo, CubesViewer Studio demo, source, documentation. Below are several examples, gathered from the demo, documentation and twitter.
It is based on the DataBrewery Cubes OLAP framework: source, documentation. Unfortunately this framework does not yet support W3C Cubes: I raised an issue to gauge the interest about such feature.
QB.js allows you to explore data expressed as RDF Data Cubes. It was developed around 2013 by Rensselaer Polytechnic Institute and is implemented in Javascript using D3 and jQuery. The use of these modern technologies makes it a promising alternative, but the project is not maintained.
A potential alternative by RPI is "WhyIs":
In the above sections we showed charts of tabular SPARQL results. However, RDF is a graph data model, and often it is useful to see the structure of that graph.
GDB WB includes visualizations that allow you a quick overview of a repository. Because these process all data in the repo, they take a while to build and are cached.
The Visual Graph display shows the relations of a selected node, and you can expand the graph further. Eg below are the connections of an offshore entity in Mauritius from the Linked Leaks dataset.
Here are the connections of Google from DBpedia. It shows locations, subsidiaries, products and industries; different classes use different colors:
The GraphDB Development Hub shows examples of Visualizing GraphDB data with Ogma JS. Ogma is a JavaScript library developed by Linkurious. It allows to develop a variety of visualizations with very little effort, as shown below
Relations of Google to people and companies, using FactForge data.
Relations between companies, including offshore entities.
There are numerous powerful and popular visualization tools, creating an amazing variety of graphs and charts, such as:
Specialized tools also exist that can display charts, graphs, flows and many other visualizations, e.g.
Some examples from recent datathons and hackathons
We have recently submitted a Horizon 2020 BigData Research proposal, a significant part of which is research and design of powerful Declarative Visualization approaches. Our goal is to make a breakthrough in RDF visualization by making it much easier to generate visualizations than is currently possible. We will adopt and extend existing declarative visualization ontologies to select data (leveraging shapes) and specify visualization parameters. Some relevant previous efforts:
Ontotext's RDF by Example[17] semantic modeling tool generates diagrams completely declaratively from RDF models (instance diagrams). It uses PlantUML and Graphviz. It allows some "diagram tweaking" with extra RDF triples in the puml: namespace (e.g. puml:arrow puml:up means to render a particular edge upwards.
E.g. below is a semantic model of mapping Dun & Bradstreet company data to the Financial Industry Business Ontology (FIBO).
It includes 152 fields grouped in 32 nodes. It is generated automatically, with just a few extra triples specifying that some arrows should go up/left/right instead of down. We could simplify it by splitting it into several diagrams, but there is a certain symmetry and repeatability that helps understanding. For example the left & right "wings" are addresses (physical and mailing) that have the same structure; top right are 3 "measures" (NetWorth, AnnualSales, ProfitLoss) having essentially the same structure as shown below (note: DnB currency code 20 means USD):
Once you include source info in your model (relational table/join for each node, and field name for each literal/URL), our tool can generate R2RML[18], which is the W3C standard for RDBMS to RDF transformation. E.g. consider the following model of Exhibitions for the J. Paul Getty Museum:
The node circled in red (representing an Exhibition at a Venue) is expanded to 15 nodes in the generated R2RML transformation, which means huge savings in complexity and maintainability:
Zooming on the left, you can see the R2RML details (rr:predicateObjectMap, rr:template, rr:termType, etc):
R2RML is verbose: on average it requires 3 nodes and 15 statements for every model statement. Writing R2RML by hand creates a lot of opportunities for mistakes and a maintenance nightmare.
Furthermore, R2RML requires semantic experts to develop and is hard to understand by subject-matter experts (museum curators, commodity trade analysts, etc). This creates a knowledge gap between semantic and domain experts: Semantic Modeling tools like RDF by Example help to bridge that gap.
A lot of the popular visualization tools (e.g. Pentaho, Centrifuge, QlikView, Tableau) are primarily geared towards tabular data and have ODBC/JDBC interfaces.
In order to save the effort of constructing query URLs as described in Invoking SPARQL Queries and saving query results, we can provide a JDBC API to GraphDB. The user feeds SPARQL queries through JDBC (not SQL queries), and SPARQL tabular results are then returned to the tool. We will reuse one of these open source libraries:
If the visualization tool does not support JDBC but ODBC, we can use the JDBC-ODBC bridge (sun.jdbc.odbc.JdbcOdbcDriver). See an example of connecting from Java to Excel using ODBC and the JDBC-ODBC bridge.
Data Visualization with GraphDB and WorkBench, page /
[1] S. Ferré. SQUALL: A Controlled Natural Language as Expressive as SPARQL 1.1. Applications of Natural Language to Information Systems (NLDB), 2013, LNCS 7934, p. 114-125. Springer
[2] S. Ferré. SQUALL: a Controlled Natural Language for Querying and Updating RDF Graphs. Controlled Natural Languages (CNL), 2012. LNCS 7427, p. 11-25, Springer.
[3] S. Ferré. SQUALL: a High-Level Language for Querying and Updating the Semantic Web. Research Report PI-1985, IRISA, 2011
[4] S. Ferré. Sparklis: An Expressive Query Builder for SPARQL Endpoints with Guidance in Natural Language, Semantic Web Journal, 2015
[5] Jens Lehmann, Lorenz Bühmann. AutoSPARQL: Let Users Query Your Knowledge Base, ESWC 2011
[6] Christina Unger, Lorenz Bühmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Philipp Cimiano. Template-based question answering over RDF data, WWW 2012
[7] Generating a Large Dataset for Neural Question Answering over the DBpedia Knowledge Base by Ann-Kathrin Hartmann, Tommaso Soru, and Edgard Marx
[8] Neural Machine Translation for Query Construction and Composition by Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Esteves, Diego Moussallem, and Gustavo Publio in ICML Workshop on Neural Abstract Machines & Program Induction (NAMPI v2)
[9] SPARQL as a Foreign Language by Tommaso Soru, Edgard Marx, Diego Moussallem, Gustavo Publio, André Valdestilhas, Diego Esteves, and Ciro Baron Neto in Proceedings of the 13th International Conference on Semantic Systems - SEMANTiCS2017 Posters and Demos
[10] Dana Dannells, Aarne Ranta, Ramona Enache, Mariana Damova, Maria Mateva, Multilingual Online Museum (WP8 Case study: Cultural Heritage). University of Gothenburg and Ontotext. MOLTO Final presentation, May 2013
[11] Dana Dannélls, Mariana Damova, Ramona Enache, Milen Chechev. Multilingual online generation from semantic web ontologies, WWW 2012
[12] Mateva M, Dannélls D, Ranta A, Enache R, Damova M. Multilingual grammar for museum object descriptions. MOLTO project deliverable 8.3, 2013
[13] Mateva M, Dannélls D, Damova M, Ranta A, Enache R. Multilingual access to cultural heritage content on the Semantic Web. ACL LATECH 2013
[14] Iva Delcheva, Yasen Kiprov, Nikolay Petrov, Victor Senderov. Linking Bulgarian Government Open Data with the Trade Register. Sofia Datathon, March 2017. Slideshare, Visualization
[15] Towards RVL: a Declarative Language for Visualizing RDFS/OWL Data
Polowinski, Jan. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics, 38:1–38:11. WIMS ’13. New York, NY, USA: ACM, 2013. PDF, SLIDES, REPORT, doi:10.1145/2479787.2479825
[16] VISO: A Shared, Formal Knowledge Base as a Foundation for Semi-automatic InfoVis Systems. Polowinski, Jan, und Martin Voigt. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems. CHI WIP ’13. Paris, France: ACM, 2013. PDF, POSTER, doi:10.1145/2468356.2468677
[17] RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. HTML, Video
[18] R2RML: RDB to RDF Mapping Language. Souripriya Das, Seema Sundara, Richard Cyganiak. W3C Recommendation. 27 Sep 2012