|Confluence Semantic Integration||SemWeb; EIS||Bachelor; Master||RDF; Confluence; OntoWiki; Named Entity Recognition||Enterprise Information||Java||Sören Auer||mailto:email@example.com||2013-11-07|
|Port Krextor to XQuery (or maybe XSPARQL)|
The Krextor XML->RDF translation library is currently implemented in XSLT 2.0, which is only supported really well by Saxon. There is no other free XSLT >= 2.0 processor, and Saxon's free edition has limitations. On the other hand there is a number of free XQuery 3.0 processors (e.g. Zorba, BaseX), and there is the XQuery-based XSPARQL language, which combines XQuery and SPARQL by rewriting SPARQL to XQuery. The only thing that XSPARQL is really missing is Krextor's rich library of functions for translating typical XML patterns to typical RDF patterns.
|SemWeb||Bachelor; Master||XML; RDF||Semantic Processing||XQuery; XSLT||http://trac.kwarc.info/krextor||Christoph Lange||Experience with XML (some XSLT preferred)||mailto:firstname.lastname@example.org||2013-11-07|
|SAP Semantic Integration|
Aim of this project is to explore means how semantic standards and technologies can be integrated into SAP's ERP.
|SemWeb; EIS||Bachelor; Master||SAP; RDF||Enterprise Information||Sören Auer||mailto:email@example.com||2013-11-07|
|Specify the schema.org data model as a DOL-conforming ontology language|
schema.org is a widely used Linked Open Data vocabulary. Its underlying data model is similar to RDFS but borrows some ideas from OWL instead. It does not have a normative formal semantics, but it can be given a formal semantics by translation to some OWL 2 profile. Doing so would be beneficial for integrated reasoning over LOD datasets which make use of schema.org but also OWL ontologies. This translation should be specified in the context of DOL, the Distributed Ontology Language currently being standardised by the OMG.
|SemWeb||Bachelor; Master||OWL; schema.org||LOD Vocabularies||(theoretical)||http://ontoiop.org||Christoph Lange||some first-order logic knowledge||mailto:firstname.lastname@example.org||2013-11-07|
|Specify SKOS as a DOL-conforming ontology language|
SKOS (Simple Knowledge Organization System) is a W3C standard formalism for modelling concept schemes, thesauri, taxonomies, etc., i.e. all kinds of lightweight ontologies. SKOS is widely used in digital libraries but can also be used for classifying any other kind of things. A large part of SKOS has been implemented as an OWL ontology, which gives it a formal semantics, but some additional “integrity conditions” cannot be expressed in OWL. They are given informally in first-order logic pseudo-code. In the context of DOL, the Distributed Ontology Language currently being standardised by the OMG, we have the chance to formally model these integrity conditions by a translation of SKOS to the first-order ontology language Common Logic standardised by the ISO. This would help to promote the implementation of reasoning engines (or the adaptation of existing engines) for SKOS.
|SemWeb||Bachelor; Master||OWL; SKOS||Digital Libraries||(theoretical)||http://ontoiop.org||Christoph Lange||some first-order logic knowledge||mailto:email@example.com||2013-11-07|
|Improve the usability of the ceur-make publishing scripts|
ceur-make is a collection of scripts (implemented in XSLT, Makefiles and Perl) that facilitate the generation of HTML+RDFa Tables of Contents for publishing scientific workshops with the CEUR-WS.org open access computer science publisher. ceur-make has so far been implemented with some functional requirements but not with usability in mind. ceur-make is currently used by about 1/5 of those editors who publish with CEUR-WS.org. We estimate that a lot more people would use it if it were more usable and required less technical expert knowledge to run. The task is to analyse the current implementation for usability, by studying the implementation itself and by surveying people who have published with CEUR-WS.org (including both those who used ceur-make and those who didn't), and then to improve the implementation accordingly.
|SemWeb||Bachelor; Master||Scripting languages; RDFa; Questionnaires||Electronic publishing||XSLT; Makefiles; Perl||https://github.com/clange/ceur-make||Christoph Lange||mailto:firstname.lastname@example.org||7.11.2013|
|XML Schema Governance|
In large enterprises a plethora of XML schemata emerges. This project should develop a registry for XML schemata, which captures metadata about the schema as well as relationships between various schema elements. The registry can be built as an OntoWiki extension.
|RDF extension for the Google Drive Spreadsheet App|
This project should explore how the Google Docs/Drive spreadsheed App can be extended to allow mapping and exporting of a spreadsheet to RDF.
SQL is one the most prevalent means of accessing data. The project aims at providing access for relational tools to RDF databases by rewriting a SQL query into a SPARQL query. This rewriting process is supported by a mapping.
|SemWeb; EIS||Master||SPARQL; RDF; SQL||Databases; Data Integration||Java;||Jörg Unbehauen||mailto:email@example.com||16.01.2014|
|Crowd-sourcing a Gold Standard.|
Gold Standard is a benchmark which can be used against other similar data of the same type in order to be compared. According to the Wikipedia article on Gold Standard Tests, this benchmark should be the "best available under reasonable conditions", but on the other hand it might not be the "best possible test for the condition in absolute terms". In other words, the perfect Gold Standard does not exist.
One main problem in assessing the quality of a dataset published in a cloud pool (such as datahub.io) is the lack of suitable gold standards. The main issue in such pools is that since the domains of the dataset differ, it is difficult to manually identify the best possible dataset which can be used as a benchmark. The motivation behind this work is that these Gold Standards can be then used as a benchmark in order to assess the "Completeness Quality" of the other datasets (of the same domain) available in the cloud.
The research questions are:
(1) How can we identify the domain of a dataset?
(2) How can we find the best possible dataset to act as a Gold Standard for it's domain?
|SemWeb||Master||RDF; Unit testing; WEKA||Classification; Semantic Processing; Data Mining||Java||Diachron||Jeremy Debattista||Some experience in classification/clustering algorithms, interested in data mining and semantic technologies||mailto:firstname.lastname@example.org||04.03.2014|
|RDB2RDF Mapping Editor|
With R2RML, a recommendation for a RDB to RDF mapping language was proposed by the W3C recently. The aim of this poject is to create a concept as well as implementation of a graphical user interface component where users are able to edit and alter R2RML mappings. This project is integrated into a broader DataWiki initiative where architecture and technology stack is predefined.
|Scalable Graph and Triple-based Access Control on RDF graphs|
Current SPARQL Endpoints still lack of sophisticated and fast access control mechanisms which tackle not only graph and user roles but als triple and pattern based access control requirements. This major issue is the one of the reasons that RDF stores are deployed only tentatively in enterprise context. This task is to develop a SPARQL endpoint based on Oracles Spatial and Graph technology and evaluate it against the Openlink Virtuoso SPARQL endpoint. This tasks is integrated into the LUCID research project.
|SemWeb; EIS||Master||RDF; SPARQL; Oracle; Java; Spring||Databases||Java; SPARQL||http://lucid-project.org/||LUCID||Sören Auer; Sebastian Tramp||Familiarity with RDF and SPARQL, Oracle Databases, Java, Spring||http://aksw.org/SebastianTramp||2014-05-30|
|Transformation of Datex2 XML schema into ontology/vocabulary|
In the context of MobiVoc project (see above) an open standard vocabulary about mobility solutions and
services will be developed. We are currently expanding it
with concepts from many domains related to mobility (e.g.
infrastructure for electric cars). DATEX (http://www.datex2.eu/) is a data model for traffic and travel data exchange. It was
developed to standardise the interface between traffic control and
information centers and is available as XML schema. In order to integrate this data model into mobility vocabular a solution for transformation DATEX into RDF standard is required.
|SemWeb; EIS||Lab; Bachelor; Master||RDF; RDFS; OWL; Linked Data||LOD Vocabularies, Mobility Data||https://github.com/mobivoc/vocol; http://www.datex2.eu/||mobivoc||Christoph Lange; Sören Auer||scripting; Git repository management; HTML||mailto:email@example.com||2014-10-17|
|From Standards to Ontologies - A Web-based tool to semantify / ontologize the knowledge of a standards with semantic technologies|
This project should deliver a tool developed with modern web technologies (e.g. MEAN stack) that visualizes and analyzes Standard specification documents and provides vocabulary creation functionality allowing the user to create a vocabulary during reading the standard document. This process should be supported with existing fundamental ontologies to simplify the integration into existing conceptual models.
|Vocabulary based-integration tool for Industry 4.0 Standards|
In this project, a Web-based tool that uses RDF(S) vocabularies to integrate data from different Standards will be develop. The proposed technologies are Play framework (Java/Scala) for the server and AngularJS/ReactJS for the client interfaces. It should allows uploading, analyzing (e.g. with different visualization techniques) the Standard files and aligned them to the correspondant vocabularies. The main motivation is to support the integration of an existing heterogenous Standard data (e.g. XML, CSV, Text, etc.) by means of RDF.
|Survey of XML→RDF Mapping Approaches|
A lot of data exchange standards particularly in industry and business are based on XML (e.g. AutomationML). To enable cross-domain information integration it is desirable to convert this information to RDF. Previous research has resulted in several XML→RDF conversion approaches. The subject of this thesis is a comprehensive survey on these approaches, guided by the requirements of real-world use cases. Some of the existing approaches auto-translate an XML schema to an RDF vocabulary/ontology; others require knowledge engineers to manually specify mappings from XML elements to RDF resources. Some provide high-level domain specific languages, others are implemented as libraries on top of general-purpose languages such as XSLT. Most work in one direction (XML→RDF), a few work in both directions (but the RDF→XML direction is of less interest to us). This thesis is expected to systematically investigate what existing approach works best.
|SemWeb; EIS||Master||XML; RDF||Industry; Mobility; Business||XSLT; XQuery; Java||LUCID; MobiVoc||Irlán Grangel; Christoph Lange||In-depth knowledge of the XML and RDF data models; proficiency in programming languages in which XML→RDF mappings are typically implemented (XSLT, XQuery, Java, …)||mailto:firstname.lastname@example.org||2015-07-30|
|RDF Update propagation using Thrift|
Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amount of requests from diverse applications. Replication of Linked Data datasets enhances flexibility of information sharing and integration infrastructures. Since hosting a replica of large datasets, such as DBpedia and LinkedGeoData, is costly, organizations might want to host only a relevant subset of the data. However, due to the evolving nature of these datasets in terms of content and ontology, maintaining a consistent and up-to-date replica of the relevant data is a challenge. The iRap (interest-based RDF update propagation) propagates only interesting parts of updates from the source to the target dataset. The current iRap framework propagates updates using SPARQL update constructs. The goal of this task is to build a serivce layer (REST/Thrift RPC) for change requests and responses using Thrift serialization format and to compare the results with existing RDF seialization formats.
|Master||Jena; Thrift; RDF; REST||Dataset Evolution and Maintainance||Java||http://eis.iai.uni-bonn.de/Projects/iRap.html||iRap||Kemele M. Endris||Knowledge of RDF data model, SPARQL query language, interest to learn binary serialization technologies (Thrift/Protobuf)||mailto:email@example.com||18.09.2015|
|Semantic Blockchain Explorer|
This topic deals with "smart contracts" using Ethereum and Linked Data. Etherum (https://www.ethereum.org) is a decentralized smart contract platform based on the blockchain concept also implemented by Bitcoin). This lab project will establish a framework to analyze smart contracts used in conjunction with semantic technologies by creating a semantic Blockchain explorer.
|SemWeb||Lab||Etherum; Blockchains; RDF||Blockchains||Java &/or Python||http://alexgorale.com/how-to-program-block-chain-explorers-with-python-part-1||Maria-Esther Vidal||Interest in blockchain concepts and semantic technologies.||mailto:firstname.lastname@example.org||11.11.2015|
|Benchmarking indexing techniques over heterogeneous data|
This project aims to survey and benchmark a wide variety of indexing tools and techniques used for open domain questions answering using heterogeneous (structured, semi-structured and un-structured) data. The importance of data is not new to the world of 21st century. For systems that perform open domain question answering, data is said to be the heart of the system and indexing is the blood of that heart. For different kinds of data, there are relevant indexing techniques. Our challenge here is to identify and test all possible solutions and report the optimal one for efficient retrieval of information.
|SemWeb||Master; Bachelor; Lab||Java; Python||Java; Python; shell scripts||WDAqua||Harsh Thakkar||High grasping ability for learning new tools such as Solr, Lemur-indri, Elastic search, etc.||mailto:email@example.com||13.02.2016|
|Remote reproducibility detector for a collaborative authoring system|
Open collaborative authoring systems usually support separation of result and test data from the article corpus. In the domain of applied computer science, many of the study results and samples can be remotely verified. This project studies current methods in remote open testing and executing platforms in collaborative writing and provides an abstract architecture with a prototype for a collaborative writing tool in the domain of EIS.
|Outlier Detection on Financial RDF data|
Implement outlier detection on numerical linked RDF data by creating a subpopulation lattice and performing multiple outlier detection steps on the subpopulations. An evaluation process of strategies for lattice generation and on different weightings for combining the outlier scores will be included.
|SemWeb||Lab||Open Data; Data Mining||OpenBudgets||Christiane Engels||mailto:firstname.lastname@example.org||16.03.2016|
|Summarisation and Visualization of License Agreements|
Legal documents and license agreements as an example, are hard to understand and people usually agree to them without actually reading them. In this project we are trying to make these complex texts more understandable for human users. The initial steps are extracting important clauses from EULA (Permissions, Prohibitions, Duties). The expected result would be a summarised version of EULA with some nice icons and pictures to attract users and also to warn them about the potential risk they may confront by agreeing to the license.
|SemWeb||Lab||RDF; OWL||information extraction, summarisation, web technologies||Java||Najmeh Mousavi||mailto:email@example.com||05.04.2016|
|To develop a QA Pipeline Component for Relation Extraction|
Question answering (QA) aims at making sense out of data via a simple to-use interface. In a typical QA pipeline process (from the input of a question to extracting the answer), there is a number of modules/sub-components to perform different functionalities like Named entity Disambiguation, Name Entity Recognition, Relation matching etc. However, QA systems are very complex and earlier approaches are focussed mainly on implementation details and these components were tightly coupled in the pipeline.
Therefore, it is cumbersome and inefficient to design and implement new or improved approaches, in particular as many components are not reusable. We aim to develop reusable, extensible open QA systems and components.
The aim of this project is to design a new totally independent component within an existing component architecture for identifying/extracting relations between entities in an input text query.
|EIS;SemWeb||Lab; Master||RDF; SPARQL;REST||Semantic processing;SOA||Java;Scala; Python||WDAqua||Kuldeep Singh||Good programming skills (any of Java, Python,or Scala), Desirable: Brief insight on RDF, Semantic Web, and|
Ontologies, Command over SPARQL.
|Decentralised Authoring, Annotations, and Social Interactions|
dokieli is a general purpose client-side application for document authoring, publication and interaction. Capabilities of the tool are enabled according to the needs and technical resources of the user. The editor is built on open Web standards and the documents are compliant with Linked Data best practices, allowing: decentralised storage and data ownership; fine-grained semantic structure through HTML+RDFa; direct in-browser editing from an LDP-based personal data store; social interactions with documents (such as annotations and replies), and notifications thereof. See also: https://dokie.li/
|Word-Embedding based on Dependency Relations and Part of Speech||word-embedding is a machine learning task. We will create an intelligent software which assigns each word a high-dimensional vector. This master work includes a survey of existing work on word-embedding, and reproduce/extend current word-embedding tools. https://levyomer.files.wordpress.com/2014/04/dependency-based-word-embeddings-acl-2014.pdf Remark: We accept more than one master students in this topic.||EIS||Master||Python, tensorflow||Machine Learning||Python||Tiansi, Denis||programming skills in Python, firstname.lastname@example.org||01.02.2017|
|A Survey on Machine Translation with a Case study between X and English||Machine translation is developing fast. The main method is shifted from statistic to neural computing. This master work is to do a survey on the existing machine-translation method, and a case study on the paired-corpus within OpenBudgets project.||EIS||Master||Python||Web technologies, machine translation, natural language process||Python||OpenBudgets||Tiansi, Fathoni, Denis||Programming skills in Python, NLPemail@example.com||01.02.2017|
|SQuAD: text-based question answering||A relevant QA task on a very recent dataset focusing on answering questions by finding the span in a given paragraph that contains the answer to the given question. For more info see https://rajpurkar.github.io/SQuAD-explorer/. Survey recent models (see table in link), learn deep learning stuff for NLP, implement an existing model and improve it.||EIS||Master; Lab||Theano (or maybe tensorflow)||NLP, QA, machine learning||Python||WDAqua||Denis, Tiansi||Experience in Python required; familiarity with machine learning concepts required; familiarity with neural networks and deep learning concepts recommended; previous experience with theano/tensorflow/torch/... is firstname.lastname@example.org; email@example.com||02.02.2017|