Autumn/Winter 2014 Newsletter #2
The first year of KEYSTONE!
Some scientific results, including what we’ve learned and also what we’ve done so far...
The first year of KEYSTONE is over, and we are delighted with the research and networking that has been carried out so far. Most of this year’s effort has been spent on networking, thus enabling our Management Committee (MC) and Working Group (WG) members to get to know each other better. We believe that this work has put in place the foundations required for our future activities and is an excellent starting point for new joint research work. As well as networking, our WGs have carried out some important activities, and we will now describe some of the results of their work.
WG1 and WG2 activities
The main goal of WG1 is to raise awareness of innovative methods and techniques which can help to analyse, index and discover structured Web data sources, specifically Linked Data. In WG2, we aim to support the development of novel methods and algorithms that can enable effective and efficient keyword search over large-scale structured data sources available on the Web. These activities are closely aligned with the dataset profiling and quality assessment techniques of WG1 enabling users to select suitable data for a given purpose. Moreover, there exists a high interdependency among the techniques and methods to discover and index structured data sources and effective and efficient keyword search methods. Therefore, the majority of activities in WG1 and WG2 involved people from both working groups. In the following, we will provide an overview of activities in WG1 and WG2 during 2014.
The main focus of WG3 - user interaction and keyword query interpretation - is on aspects such as the disambiguation of queries (e.g. semantic disambiguation), the development of languages for keyword search in modern search applications (including social applications) and the use of feedback from users for improving results.
One of the main achievements in the past year was the organisation of the first edition of the SDSW workshop (Surfacing the Deep and the Social Web) at the International Semantic Web Conference 2014, jointly run with WG4. Our hope is that this workshop will be sustained for a number of years and that the topic will gain increased visibility at future conferences.
Members of WG3 have also recently been involved in research topics such as:
What countries are in the KEYSTONE COST Action?
So far, KEYSTONE involves teams of researchers from 28 countries: Austria, Belgium, Bulgaria, Croatia, Cyprus, Estonia, Finland, France, FYR Macedonia, Germany, Greece, Ireland, Israel, Italy, Malta, Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, and the United Kingdom.
One of the goals of KEYSTONE is to create a large network of researchers working on semantic keyword-based search on structured data sources. Teams, researchers, and practitioners interested in themes related to keyword search can join the Action and benefit from the opportunities made available by the COST framework by simply filling in the form available here. People from other countries not included in the list can contact their national COST office (see www.cost.eu for more details) to find out about the procedure to participate in the network. We are open to other teams and countries, and we encourage those interested to join us! You can find out more about why you should join here.
In 2014, our WG1 and WG2 members actively participated in a number of WG meetings including the “Semantic Keyword Search in Big Data” event in March 2014 in Leiden, and “Querying the Semantic Web” in October 2014 in Riva del Garda, Italy (both co-located with MC meetings). In addition, in order to foster discussions on specific WG1 and WG2 topics within the KEYSTONE network and to identify important challenges and research directions in this context, WG2 and WG1 co-organised a dedicated meeting in May 2014 in Hersonissos, Greece, where members of both working groups discussed open research challenges and exchanged ideas for possible collaborations.
To increase awareness of WG1- and WG2-related topics within the scientific community, the members of WG1 and WG2 initiated and co-organised the First International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES ’14) during the 11th European Semantic Web Conference (ESWC 2014) in Greece.
As a continuation of this effort, members of WG1 and WG2 collaborated on a follow-up proposal for “PROFILES ’15” that has been accepted for ESWC 2015 in Slovenia. WG1 and WG2 members also initiated a special journal issue on Dataset Profiling and Federated Search for Linked Data that has been accepted for the International Journal on Semantic Web and Information Systems (IJSWIS). Activities related to the organisation of PROFILES ’15 and the IJSWIS special issue will continue into 2015.
WG4 of the KEYSTONE COST action, concerned with research integration, showcases, benchmarks and evaluations, was involved in the following research activities:
The KEYSTONE COST Action’s management teams
Management Committee (MC). The MC is in charge of supervising and coordinating the Action and is composed of two representatives of each participating country. The list of KEYSTONE MC members is online.
Executive Scientific Board (ESB). The ESB has the responsibility for planning, managing and executing the activities. It is composed of the following positions:
Editorial Board. This is a subset of the MC and is in charge of planning the publication strategy, reviewing the Action publications, deliverables, and the website.
Work and budget plan for 2015
€164,000 euro is the budget allocated by the COST Association to the KEYSTONE COST Action
The Work and Budget Plan for the second grant period (from 01/12/2014 to 31/12/2015) has been approved by the Management Committee of the KEYSTONE COST Action and the COST Office. All of the activities (i.e. the research topics explored during our meetings, the other networking and coordination tools promoted by the Action, and the research activities jointly addressed by our WG members) will be built around four main “specifications” of the general theme related to “keyword search in large amounts of data”:
The work in this field will allow KEYSTONE to continue the research activities begun in 2014 by elaborating on topics related to keyword search in big data. There are also three main events scheduled for 2015:
Collaborations between our WG members is useful (and much needed!) for the realisation of fruitful events. For this reason, a number of calls for organisation and calls for volunteers will be issued shortly.
Encouraging high-school students to use keyword-based search tools
KEYSTONE has collaborated in the organisation of the Anita Borg Week in Spain, in particular in a contest called WikinformáticA! in Aragón which was created to achieve the following main goals:
In the contest, teams of high-school students were provided with photographs of prominent women in the history of ICT, and were tasked to find out who were the people in those photographs. After discovering who they were, students had to develop a biography about those people in a wiki format. In this first edition of WikinformáticA in Aragón, 95 high-school students took part and were divided into eight teams. Most participants enjoyed the experience very much, and learned how to search for information and create a wiki. The content of the wiki pages they curated will be included in the Spanish Wikipedia over the next few months. Moreover, annotated data will also be incorporated into the Spanish DBpedia.
White Paper on keyword search in big data
The main goal of this first White Paper from the KEYSTONE COST Action was to present a coherent view of the contributions of participants in a “brain-writing session” that took place at the Action's second meeting in Leiden on 24 March 2014. We performed an analysis of six key aspects of keyword search: challenges in keyword-based search; practical scenarios that could benefit from keyword search research; methods for supporting a user in keyword queries and results analysis; methods for obtaining optimal results from keyword queries; benchmarking environments and evaluation of keyword search; and KEYSTONE's application fields.
In order to achieve rapid results, while ensuring the active involvement of all participants (as well as collecting contributions from each participant's areas of expertise), the methodology that was used to gather insights on the key aspects of keyword search was carried out through a directed brain-writing session. Brain-writing is a method that takes advantage of group priming effects through writing and reading interaction and that reduces traditional brainstorming production blocking due to face-to-face interaction inhibitions. In this method, a participant writes his or her ideas down on a piece of paper, passes them on to a second participant, who reads and develops them further by adding his or her own ideas and comments, and then that second participant passes the paper on to yet another (third) participant. The ideas are passed forward, and thus developed and screened by three different participants, without returning to the original source. The answers to each question were briefly summarised and discussed in a subsequent session of the meeting, to collect general ideas and contributions from all participants. All contributions and comments were further analysed and extended upon by five appointed members, and the results are the ones as presented in this final White Paper.
The White Paper is here: http://www.keystone-cost.eu/keystone/wp-content/uploads/2015/01/WhitePaper_first.pdf
The 11 short-term scientific missions (STSMs) funded by KEYSTONE COST in 2014
Keyword-Based Search Foundations
Prof. Jorge Cardoso (1 week)
From: University of Coimbra, Portugal. To: University of Modena and Reggio Emilia, Italy
Between August and September 2014, Jorge Cardoso from the University of Coimbra, and Francesco Guerra from the University of Modena and Reggio Emilia, were involved in an STSM. The mission took place in Italy. During one week, the two researchers delineated the strategy for the KEYSTONE COST Action, discussed how recent findings could contribute to the development of new keyword-based search mechanisms, and identified which approaches should be explored.
Exploratory- and Keyword- Based Design of Data-Intensive Scientific Workflows
Dr Khalid Belhajjame (10 days)
From: Paris-Dauphine University, France. To: Delft University of Technology, The Netherlands
Thanks to the STSM granted to Khalid Belhajjame from the Paris-Dauphine University, he was able to elaborate together with his host, Alessandro Bozzon from the Delft University of Technology, a new model and architecture for designing keyword-based exploratory data science processes. The visit was successful and fruitful both in terms of the research results obtained, but also in terms of the collaborations that have been built between the visitor and host institutions.
Semantic Classification of Liver Radiological Reports
Dr María del Mar Roldán-García (1 week)
From: University of Malaga, Spain. To: Boğaziçi University Istanbul, Turkey
For a week in June 2014, Dr Roldán visited Boğaziçi University to work with researchers from VAVLab. It was a fruitful visit and the collaborators are still working together. As a result of this STSM, ONLIRA (Ontology of the Liver for Radiology) was classified using a semantic reasoner and was populated with RDF data from the CaReRa project. Furthermore, ONLIRA inconsistencies and redundancies were eliminated. Finally, ONLIRA was extended to include information about liver patients.
Extraction and Representation of Placenames for the Reconstruction of Itineraries from Texts
Mr Ludovic Moncla (1 month)
From: Université de Pau et des Pays de l'Adour, France. To: University of Zaragoza, Spain
Spatial-based queries are one of the most common kind of queries for searching on structured data sources. This project dealt with the extraction and representation of placenames for the reconstruction of itineraries from textual sources, converting these sources into structured data sources with respect to spatial information. During this STSM, the collaborators made a proposal for a map-based algorithm for toponyms disambiguation based on clustering techniques. The technique is also able to infer the location of those toponyms not found in a geographic database thanks to the previously disambiguated toponyms. A paper "Geocoding for Texts with Fine-Grain Toponyms: An Experiment on a Geoparsed Hiking Descriptions Corpus" was written and accepted as a full paper for the ACM SIGSPATIAL 2014 conference.
Keyword Search on Graph Data
Dr Paolo Missier (10 days)
From: Newcastle University, UK. To: University of Modena and Reggio Emilia, Italy
This STSM explored the combination of formal and practical notions of data provenance, with known approaches to the problem of keyword-based search over relational databases (or, more generally, semi-structured data models such as RDF).
Keyword Search on Relational Data in the Deep Web
Dr Andrea Cali (1 week)
From: Birbeck, University of London, UK. To: Roma Tre University, Italy
The topic of the visit was keyword searches on Deep Web (aka Hidden Web) data sources. The Deep Web is constituted of data that is accessible through Web pages, but are not indexable by search engines, being returned in dynamic pages. The researchers developed the notion of keyword search in the context of web data sources accessible through HTML forms, and proposed a preliminary framework for it.
From Recommender Systems to Personalised Academic Search Engines
Mr Stefan Langer (1 month)
From: Otto-von-Guericke University Magdeburg, Germany. To: University of Cyprus, Cyprus
A search engine for academic papers was planned and implemented, using the open source literature management software Docear. Large parts of Docear's existing academic paper recommender system were reused in order to recommend search terms to users. Docear’s academic paper recommender system has been developed over some years, and in a recent evaluation 240,948 recommendations delivered to 4,153 users between April 2013 and June 2014 were analysed.
Context Information Extraction from Social Media Structured Sources
Dr Georgia Kapitsaki (20 days)
From: University of Cyprus, Cyprus. To: Delft University of Technology, Netherlands
Improvement of Automatic Formalization Processes for Thesauri
Dr Javier Lacasta (1 month)
From: University of Zaragoza, Spain. To: University of Geneva, Switzerland
The STSM has enabled a sustained collaboration between the IAAA group of the University of Zaragoza and the ICLE in the University of Geneva for the automatic formalisation of thesauri and their conversion into ontologies. It has allowed the participants to evaluate the problem complexity of automatically generating an ontology that can be used to improve information retrieval in keyword/relation-based queries on documents and collections, and the associated classification processes. Based on this, the researchers have defined a system architecture to perform such a task using information extracted from knowledge organisation systems, metadata, text documents and databases.Currently, they are developing parts of such a system that, when finished, can provide high-quality formal models from simple thesauri.
Semantic and Crowdsourced Keyword-Based Search
Dr Marco Brambilla (1 week)
From: Polytechnic University of Milan, Italy. To: Paris-Dauphine University, France
The objective of the visit was to combine semantic modeling and crowdsourcing for the purpose of improving keyword-based search. More concretely, the visit aimed to begin a collaboration on keyword-based search over complex, structured data sources, by combining the expertise of Paris-Dauphine on semantic modeling and querying of structured big data, with the expertise of Dr Brambilla’s own group in Milan on crowdsourcing, media analysis, and structured Web content.
Improving Users' Search Behaviour
Dr Claudia Hauff (1 week)
From: Delft University of Technology, Netherlands. To: University of Lugano, Switzerland
In October 2014, Dr Hauff joined the Information Retrieval research group at the University of Lugano for a one-week research visit. They worked together on ideas of how to explicitly train users to use search systems in a better, more effective, manner. The main focus of their research was how to implement such a training phase, how to evaluate the effects the training has, and how long-lasting these effects are.
Report on our 2nd WG meeting “Querying the Semantic Web”
17-18 October 2014
Our 2nd WG Meeting was held in Riva del Garda, Italy, with the University of Trento as local organiser.
Forty-nine researchers (24% female) participated in the meeting.
They came from 22 countries:
The meeting focused on activities for improving networking among the participants, and promoted joint research in the field of querying the Semantic Web. A particular focus was given to the needs of enterprise.
The first part of the meeting was focused on an analysis of the meeting’s participants. We created a tag cloud of participants’ skills as shown below, by asking WG Members to provide two keywords defining their research activity.
Some brainstorming sessions were organised to identify the main open research issues in the area. The sessions started with six questions about datasets, source access, query methods, evaluation, benchmarks, and competitions. These questions were put to the participants who were then asked to answer some of them in a written manner. The written answers constituted the starting point of discussions that followed.
In some brainstorming sessions afterwards, several participants (Georgia Kapitsaki, Paolo Missier, Andrea Calì, Andre Freitas, Jorge Cardoso, Nicola Ferro and Yannis Velegrakis) were appointed to lead the analysis of the questions and answers. The rest of the participants were divided into small groups. Each group participated in a short discussion (20 minutes) on all of the questions. The results of the activity were presented in a plenary session, and a White Paper around the outcomes is currently under development.
An invited talk was also delivered by Bruno Crispo, Università di Trento, on encrypted search.
Finally, a roundtable was organised to discuss “The Semantic Web, Smart Cities and Enterprise: Opportunities for Research”. We invited representatives from enterprise for this roundtable: Marco Combetto (Informatica Trentina SPA), Paolo Bouquet (Università of Trento and Okkam SRL), Vladimir Alexiev (Ontotext), and Henrik Strindberg (Xcerion). They were asked to answer two main questions:
Involvement in “III Jornadas esDBpedia”
Next KEYSTONE meeting!
DBpedia is a key semantic resource in the Linked Open Data cloud (LOD cloud). DBpedia data primarily comes from Wikipedia (the popular online encyclopedia that grows daily thanks to the contributions of many users), in particular from the structured data embedded within the content of Wikipedia articles. It has also recently begun to include structured data from other sources such as Wikidata or Wiktionary. Until 2011, only data from the English language Wikipedia was extracted, but now DBpedias have been created for different language Wikipedias. For the Spanish language version, esDBpedia was created.
In January 2015, some members of WG1 participated in the “III Jornadas esDBpedia” event to improve the amount and quality of data in esDBpedia, and to find out how semantic data and storage technologies are being used to create applications in different contexts, for example in the Biblioteca Nacional de España.
The next meeting will be held in the University Library at the Technical University of Košice, Slovakia, from 11-12 May 2015.
Calls promoted by KEYSTONE
Springer LNCS Transactions on Computational Collective Intelligence (TCCI): Special Issue on Keyword Search and Big Data
2nd International Workshop on Surfacing the Deep and the Social Web (SDSW 2015), co-located with ESWC 2015 (31 May-4 June 2015)
2nd International Workshop on dataset PROFIling and fEderated Search for linked data (PROFILES ’15), co-located with ESWC 2015 (31 May-4 June 2015)
IGI Global International Journal on Semantic Web and Information Systems (IJSWIS): Special Issue on Dataset Profiling and Federated Search for Linked Data
Call for industry participation in the KEYSTONE COST Action
KEYSTONE is a COST Action that brings together people from 28 countries and aims at enabling research activity and technology transfer in the area of keyword-based search over structured data sources.
KEYSTONE achieves its goal by organising meetings and training schools, by funding the participation of researchers and practitioners in these events, and by supporting short-term scientific missions.
Why should you consider participating in KEYSTONE?
It is a “place” where you can:
What are the requirements for participating in KEYSTONE?
The only requirement is to have an active participation during our meetings. In particular for 2015, we would like to organise sessions during our annual meetings oriented towards industrial experts. The goal is to establish a “bidirectional exchange” of knowledge between industry and academia, to define specific industry needs to address, and to design use cases and scenarios for evaluating the techniques developed by KEYSTONE.
Although the main topic of the action is keyword search in large-scale data, the expertise of the participants spans across all major areas of large-scale data management.
If you are interested in joining KEYSTONE or if you need further details, please send an email to the Chair of the Action: firstname.lastname@example.org