Uni Bonn EIS Bachelor, Master and Lab Topics
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

View only
Confluence Semantic IntegrationSemWeb; EISBachelor; MasterRDF; Confluence; OntoWiki; Named Entity RecognitionEnterprise InformationJavaSören Auermailto:auer@cs.uni-bonn.de2013-11-07
Port Krextor to XQuery (or maybe XSPARQL)
The Krextor XML->RDF translation library is currently implemented in XSLT 2.0, which is only supported really well by Saxon. There is no other free XSLT >= 2.0 processor, and Saxon's free edition has limitations. On the other hand there is a number of free XQuery 3.0 processors (e.g. Zorba, BaseX), and there is the XQuery-based XSPARQL language, which combines XQuery and SPARQL by rewriting SPARQL to XQuery. The only thing that XSPARQL is really missing is Krextor's rich library of functions for translating typical XML patterns to typical RDF patterns.
SemWebBachelor; MasterXML; RDFSemantic ProcessingXQuery; XSLThttp://trac.kwarc.info/krextorChristoph LangeExperience with XML (some XSLT preferred)mailto:math.semantic.web@gmail.com2013-11-07
SAP Semantic Integration
Aim of this project is to explore means how semantic standards and technologies can be integrated into SAP's ERP.
SemWeb; EISBachelor; MasterSAP; RDFEnterprise InformationSören Auermailto:auer@cs.uni-bonn.de2013-11-07
Specify the schema.org data model as a DOL-conforming ontology language
schema.org is a widely used Linked Open Data vocabulary. Its underlying data model is similar to RDFS but borrows some ideas from OWL instead. It does not have a normative formal semantics, but it can be given a formal semantics by translation to some OWL 2 profile. Doing so would be beneficial for integrated reasoning over LOD datasets which make use of schema.org but also OWL ontologies. This translation should be specified in the context of DOL, the Distributed Ontology Language currently being standardised by the OMG.
SemWebBachelor; MasterOWL; schema.orgLOD Vocabularies(theoretical)http://ontoiop.orgChristoph Langesome first-order logic knowledgemailto:math.semantic.web@gmail.com2013-11-07
Specify SKOS as a DOL-conforming ontology language
SKOS (Simple Knowledge Organization System) is a W3C standard formalism for modelling concept schemes, thesauri, taxonomies, etc., i.e. all kinds of lightweight ontologies. SKOS is widely used in digital libraries but can also be used for classifying any other kind of things. A large part of SKOS has been implemented as an OWL ontology, which gives it a formal semantics, but some additional “integrity conditions” cannot be expressed in OWL. They are given informally in first-order logic pseudo-code. In the context of DOL, the Distributed Ontology Language currently being standardised by the OMG, we have the chance to formally model these integrity conditions by a translation of SKOS to the first-order ontology language Common Logic standardised by the ISO. This would help to promote the implementation of reasoning engines (or the adaptation of existing engines) for SKOS.
SemWebBachelor; MasterOWL; SKOSDigital Libraries(theoretical)http://ontoiop.orgChristoph Langesome first-order logic knowledgemailto:math.semantic.web@gmail.com2013-11-07
Improve the usability of the ceur-make publishing scripts
ceur-make is a collection of scripts (implemented in XSLT, Makefiles and Perl) that facilitate the generation of HTML+RDFa Tables of Contents for publishing scientific workshops with the CEUR-WS.org open access computer science publisher. ceur-make has so far been implemented with some functional requirements but not with usability in mind. ceur-make is currently used by about 1/5 of those editors who publish with CEUR-WS.org. We estimate that a lot more people would use it if it were more usable and required less technical expert knowledge to run. The task is to analyse the current implementation for usability, by studying the implementation itself and by surveying people who have published with CEUR-WS.org (including both those who used ceur-make and those who didn't), and then to improve the implementation accordingly.
SemWebBachelor; MasterScripting languages; RDFa; QuestionnairesElectronic publishingXSLT; Makefiles; Perlhttps://github.com/clange/ceur-makeChristoph Langemailto:math.semantic.web@gmail.com7.11.2013
XML Schema Governance
In large enterprises a plethora of XML schemata emerges. This project should develop a registry for XML schemata, which captures metadata about the schema as well as relationships between various schema elements. The registry can be built as an OntoWiki extension.
SemWeb; EISBachelor; MasterXML; XML-Schema; XSLT; RDFEnterprise InformationPHP; JavaScripthttp://ontowiki.netSören AuerHigh motivation and good programming skillsmailto:auer@cs.uni-bonn.de2013-07-13
RDF extension for the Google Drive Spreadsheet App
This project should explore how the Google Docs/Drive spreadsheed App can be extended to allow mapping and exporting of a spreadsheet to RDF.
SemWebMasterGoogle Drive; RDFCloud Computing AppsJavaScripthttp://drive.google.comSören Auermailto:auer@cs.uni-bonn.de2013-07-13taken
SQL-to-SPARQL rewriting
SQL is one the most prevalent means of accessing data. The project aims at providing access for relational tools to RDF databases by rewriting a SQL query into a SPARQL query. This rewriting process is supported by a mapping.
SemWeb; EISMasterSPARQL; RDF; SQLDatabases; Data IntegrationJava;Jörg Unbehauenmailto:unbehauen@informatik.uni-leipzig.de16.01.2014
Crowd-sourcing a Gold Standard.
Gold Standard is a benchmark which can be used against other similar data of the same type in order to be compared[1]. According to the Wikipedia article on Gold Standard Tests[2], this benchmark should be the "best available under reasonable conditions", but on the other hand it might not be the "best possible test for the condition in absolute terms". In other words, the perfect Gold Standard does not exist.

One main problem in assessing the quality of a dataset published in a cloud pool (such as datahub.io) is the lack of suitable gold standards. The main issue in such pools is that since the domains of the dataset differ, it is difficult to manually identify the best possible dataset which can be used as a benchmark. The motivation behind this work is that these Gold Standards can be then used as a benchmark in order to assess the "Completeness Quality"[3] of the other datasets (of the same domain) available in the cloud.

The research questions are:
(1) How can we identify the domain of a dataset?
(2) How can we find the best possible dataset to act as a Gold Standard for it's domain?

[1] http://www.merriam-webster.com/thesaurus/gold%20standard
[2] http://en.wikipedia.org/wiki/Gold_standard_%28test%29
[3] http://www.semantic-web-journal.net/system/files/swj556.pdf
SemWebMasterRDF; Unit testing; WEKAClassification; Semantic Processing; Data MiningJavaDiachronJeremy DebattistaSome experience in classification/clustering algorithms, interested in data mining and semantic technologiesmailto:jerdebattista@gmail.com04.03.2014
RDB2RDF Mapping Editor
With R2RML, a recommendation for a RDB to RDF mapping language was proposed by the W3C recently. The aim of this poject is to create a concept as well as implementation of a graphical user interface component where users are able to edit and alter R2RML mappings. This project is integrated into a broader DataWiki initiative where architecture and technology stack is predefined.
SemWeb; EISBachelor; MasterRDF; HTML5; CSS; react.jsUser InterfacesJavascripthttp://lucid-project.org/LUCIDSören Auer; Sebastian TrampFamiliarity with RDF, relational Databases, node.js and angular.jshttp://aksw.org/SebastianTramp2014-05-30
Scalable Graph and Triple-based Access Control on RDF graphs
Current SPARQL Endpoints still lack of sophisticated and fast access control mechanisms which tackle not only graph and user roles but als triple and pattern based access control requirements. This major issue is the one of the reasons that RDF stores are deployed only tentatively in enterprise context. This task is to develop a SPARQL endpoint based on Oracles Spatial and Graph technology and evaluate it against the Openlink Virtuoso SPARQL endpoint. This tasks is integrated into the LUCID research project.
SemWeb; EISMasterRDF; SPARQL; Oracle; Java; SpringDatabasesJava; SPARQLhttp://lucid-project.org/LUCIDSören Auer; Sebastian TrampFamiliarity with RDF and SPARQL, Oracle Databases, Java, Springhttp://aksw.org/SebastianTramp2014-05-30
Transformation of Datex2 XML schema into ontology/vocabulary
In the context of MobiVoc project (see above) an open standard vocabulary about mobility solutions and
services will be developed. We are currently expanding it
with concepts from many domains related to mobility (e.g.
infrastructure for electric cars). DATEX (http://www.datex2.eu/) is a data model for traffic and travel data exchange. It was
developed to standardise the interface between traffic control and
information centers and is available as XML schema. In order to integrate this data model into mobility vocabular a solution for transformation DATEX into RDF standard is required.
SemWeb; EISLab; Bachelor; MasterRDF; RDFS; OWL; Linked DataLOD Vocabularies, Mobility Datahttps://github.com/mobivoc/vocol; http://www.datex2.eu/mobivocChristoph Lange; Sören Auerscripting; Git repository management; HTMLmailto:math.semantic.web@gmail.com2014-10-17
From Standards to Ontologies - A Web-based tool to semantify / ontologize the knowledge of a standards with semantic technologies
This project should deliver a tool developed with modern web technologies (e.g. MEAN stack) that visualizes and analyzes Standard specification documents and provides vocabulary creation functionality allowing the user to create a vocabulary during reading the standard document. This process should be supported with existing fundamental ontologies to simplify the integration into existing conceptual models.
LabMEAN stack; vocabularies; MongoDB; Node.js; Angular.jsvocabulary/ontology creationJavascriptIDSIrlán GrangelJavaScript based tools (e.g. AngularJS, ReactJS), understanding vocabulary creation and standards document reading/understandingmailto:grangel@cs.uni-bonn.de2016-04-05
Vocabulary based-integration tool for Industry 4.0 Standards
In this project, a Web-based tool that uses RDF(S) vocabularies to integrate data from different Standards will be develop. The proposed technologies are Play framework (Java/Scala) for the server and AngularJS/ReactJS for the client interfaces. It should allows uploading, analyzing (e.g. with different visualization techniques) the Standard files and aligned them to the correspondant vocabularies. The main motivation is to support the integration of an existing heterogenous Standard data (e.g. XML, CSV, Text, etc.) by means of RDF.
SemWebLabPlay framework; React.JS; Angular.JSSemantic data Integration, vocabulary modelingJava; Scala; JavascriptIDSIrlán GrangelJava or Scala, Javascript (AngularJS, ReactJS)mailto:grangel@cs.uni-bonn.de2015-04-07
Survey of XML→RDF Mapping Approaches
A lot of data exchange standards particularly in industry and business are based on XML (e.g. AutomationML). To enable cross-domain information integration it is desirable to convert this information to RDF. Previous research has resulted in several XML→RDF conversion approaches. The subject of this thesis is a comprehensive survey on these approaches, guided by the requirements of real-world use cases. Some of the existing approaches auto-translate an XML schema to an RDF vocabulary/ontology; others require knowledge engineers to manually specify mappings from XML elements to RDF resources. Some provide high-level domain specific languages, others are implemented as libraries on top of general-purpose languages such as XSLT. Most work in one direction (XML→RDF), a few work in both directions (but the RDF→XML direction is of less interest to us). This thesis is expected to systematically investigate what existing approach works best.
SemWeb; EISMasterXML; RDFIndustry; Mobility; BusinessXSLT; XQuery; JavaLUCID; MobiVocIrlán Grangel; Christoph LangeIn-depth knowledge of the XML and RDF data models; proficiency in programming languages in which XML→RDF mappings are typically implemented (XSLT, XQuery, Java, …)mailto:grangel@cs.uni-bonn.de2015-07-30
RDF Update propagation using Thrift
Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amount of requests from diverse applications. Replication of Linked Data datasets enhances flexibility of information sharing and integration infrastructures. Since hosting a replica of large datasets, such as DBpedia and LinkedGeoData, is costly, organizations might want to host only a relevant subset of the data. However, due to the evolving nature of these datasets in terms of content and ontology, maintaining a consistent and up-to-date replica of the relevant data is a challenge. The iRap (interest-based RDF update propagation) propagates only interesting parts of updates from the source to the target dataset. The current iRap framework propagates updates using SPARQL update constructs. The goal of this task is to build a serivce layer (REST/Thrift RPC) for change requests and responses using Thrift serialization format and to compare the results with existing RDF seialization formats.
MasterJena; Thrift; RDF; RESTDataset Evolution and MaintainanceJavahttp://eis.iai.uni-bonn.de/Projects/iRap.htmliRapKemele M. EndrisKnowledge of RDF data model, SPARQL query language, interest to learn binary serialization technologies (Thrift/Protobuf)mailto:endris@iai.uni-bonn.de18.09.2015
Semantic Blockchain Explorer
This topic deals with "smart contracts" using Ethereum and Linked Data. Etherum (https://www.ethereum.org) is a decentralized smart contract platform based on the blockchain concept also implemented by Bitcoin). This lab project will establish a framework to analyze smart contracts used in conjunction with semantic technologies by creating a semantic Blockchain explorer.
SemWebLabEtherum; Blockchains; RDFBlockchainsJava &/or Pythonhttp://alexgorale.com/how-to-program-block-chain-explorers-with-python-part-1Maria-Esther VidalInterest in blockchain concepts and semantic technologies.mailto:english@cs.uni-bonn.de11.11.2015
Benchmarking indexing techniques over heterogeneous data
This project aims to survey and benchmark a wide variety of indexing tools and techniques used for open domain questions answering using heterogeneous (structured, semi-structured and un-structured) data. The importance of data is not new to the world of 21st century. For systems that perform open domain question answering, data is said to be the heart of the system and indexing is the blood of that heart. For different kinds of data, there are relevant indexing techniques. Our challenge here is to identify and test all possible solutions and report the optimal one for efficient retrieval of information.
SemWebMaster; Bachelor; LabJava; PythonJava; Python; shell scriptsWDAquaHarsh ThakkarHigh grasping ability for learning new tools such as Solr, Lemur-indri, Elastic search, etc.mailto:hthakkar@uni-bonn.de13.02.2016
Remote reproducibility detector for a collaborative authoring system
Open collaborative authoring systems usually support separation of result and test data from the article corpus. In the domain of applied computer science, many of the study results and samples can be remotely verified. This project studies current methods in remote open testing and executing platforms in collaborative writing and provides an abstract architecture with a prototype for a collaborative writing tool in the domain of EIS.
EISMaster; LabRDF; JSON; Scripting languagesOpen DataJavascript; PHP; Python; JavaOSCOSSAfshin SadeghiFamiliarity with RDF, relational databases, Software Modeling, Javascriptmailto:sadeghi@cs.uni-bonn.de11.03.2016
Outlier Detection on Financial RDF data
Implement outlier detection on numerical linked RDF data by creating a subpopulation lattice and performing multiple outlier detection steps on the subpopulations. An evaluation process of strategies for lattice generation and on different weightings for combining the outlier scores will be included.
SemWebLabOpen Data; Data MiningOpenBudgetsChristiane Engelsmailto:christiane.engels@iais.fraunhofer.de16.03.2016
Summarisation and Visualization of License Agreements
Legal documents and license agreements as an example, are hard to understand and people usually agree to them without actually reading them. In this project we are trying to make these complex texts more understandable for human users. The initial steps are extracting important clauses from EULA (Permissions, Prohibitions, Duties). The expected result would be a summarised version of EULA with some nice icons and pictures to attract users and also to warn them about the potential risk they may confront by agreeing to the license.
SemWebLabRDF; OWLinformation extraction, summarisation, web technologiesJavaNajmeh Mousavimailto:nejad@iai.uni-bonn.de05.04.2016
To develop a QA Pipeline Component for Relation Extraction
Question answering (QA) aims at making sense out of data via a simple to-use interface. In a typical QA pipeline process (from the input of a question to extracting the answer), there is a number of modules/sub-components to perform different functionalities like Named entity Disambiguation, Name Entity Recognition, Relation matching etc. However, QA systems are very complex and earlier approaches are focussed mainly on implementation details and these components were tightly coupled in the pipeline.
Therefore, it is cumbersome and inefficient to design and implement new or improved approaches, in particular as many components are not reusable. We aim to develop reusable, extensible open QA systems and components.
The aim of this project is to design a new totally independent component within an existing component architecture for identifying/extracting relations between entities in an input text query.
EIS;SemWebLab; MasterRDF; SPARQL;REST Semantic processing;SOAJava;Scala; PythonWDAquaKuldeep SinghGood programming skills (any of Java, Python,or Scala), Desirable: Brief insight on RDF, Semantic Web, and
Ontologies, Command over SPARQL.
Decentralised Authoring, Annotations, and Social Interactions
dokieli is a general purpose client-side application for document authoring, publication and interaction. Capabilities of the tool are enabled according to the needs and technical resources of the user. The editor is built on open Web standards and the documents are compliant with Linked Data best practices, allowing: decentralised storage and data ownership; fine-grained semantic structure through HTML+RDFa; direct in-browser editing from an LDP-based personal data store; social interactions with documents (such as annotations and replies), and notifications thereof. See also: https://dokie.li/
EISLab; Master; PhDRDF; SPARQL; HTML; RDFa; CSS, JavaScript; Linked Data Platform; Solid; RESTScholarly Communication; Information Science; Semantic Web; Semantic Publishing; Open Access; Open Science; User InterfacesRDF; SPARQLhttps://github.com/linkeddata/dokieliOSCOSSSarven CapadisliDogfooding! Have a look at http://csarven.ca/dokieli-rww , https://dokie.li/ , http://csarven.ca/linked-research-scholarly-communicationmailto:info@csarven.ca27.04.2016
Word-Embedding based on Dependency Relations and Part of Speechword-embedding is a machine learning task. We will create an intelligent software which assigns each word a high-dimensional vector. This master work includes a survey of existing work on word-embedding, and reproduce/extend current word-embedding tools. https://levyomer.files.wordpress.com/2014/04/dependency-based-word-embeddings-acl-2014.pdf Remark: We accept more than one master students in this topic.EISMasterPython, tensorflowMachine Learning
Python Tiansi, Denisprogramming skills in Python, tensorflowtdong@uni-bonn.de 01.02.2017
A Survey on Machine Translation with a Case study between X and EnglishMachine translation is developing fast. The main method is shifted from statistic to neural computing. This master work is to do a survey on the existing machine-translation method, and a case study on the paired-corpus within OpenBudgets project.EISMasterPythonWeb technologies, machine translation, natural language processPythonOpenBudgetsTiansi, Fathoni, DenisProgramming skills in Python, NLPtdong@uni-bonn.de01.02.2017
SQuAD: text-based question answeringA relevant QA task on a very recent dataset focusing on answering questions by finding the span in a given paragraph that contains the answer to the given question. For more info see https://rajpurkar.github.io/SQuAD-explorer/. Survey recent models (see table in link), learn deep learning stuff for NLP, implement an existing model and improve it.EISMaster; LabTheano (or maybe tensorflow)NLP, QA, machine learningPythonWDAquaDenis, TiansiExperience in Python required; familiarity with machine learning concepts required; familiarity with neural networks and deep learning concepts recommended; previous experience with theano/tensorflow/torch/... is awesome
lukovnik@cs.uni-bonn.de; tdong@uni-bonn.de02.02.2017
Tracking changes on a collaborative writing systemCollaborative writing envirolments provide a place of writing an article together online. We investigate and provide a method to service writers to track their changes in a system similar to git. We encourage group work in the lab subject which in domain of OSCOSS project.EISLabJavascript, GithubWeb technologiesjavascriptOSCOSSAfshin SadeghiBasic understanding of Javascript & online authoring systems
Main menu