Autumn/Winter 2014 Newsletter #2

www.keystone-cost.eu

@keystone_cost

The first year of KEYSTONE!

Some scientific results, including what we’ve learned and also what we’ve done so far...

skate.jpg

The first year of KEYSTONE is over, and we are delighted with the research and networking that has been carried out so far. Most of this year’s effort has been spent on networking, thus enabling our Management Committee (MC) and Working Group (WG) members to get to know each other better. We believe that this work has put in place the foundations required for our future activities and is an excellent starting point for new joint research work. As well as networking, our WGs have carried out some important activities, and we will now describe some of the results of their work.

WG1 and WG2 activities

The main goal of WG1 is to raise awareness of innovative methods and techniques which can help to analyse, index and discover structured Web data sources, specifically Linked Data. In WG2, we aim to support the development of novel methods and algorithms that can enable effective and efficient keyword search over large-scale structured data sources available on the Web. These activities are closely aligned with the dataset profiling and quality assessment techniques of WG1 enabling users to select suitable data for a given purpose. Moreover, there exists a high interdependency among the techniques and methods to discover and index structured data sources and effective and efficient keyword search methods. Therefore, the majority of activities in WG1 and WG2 involved people from both working groups. In the following, we will provide an overview of activities in WG1 and WG2 during 2014.

WG3 activities

The main focus of WG3 - user interaction and keyword query interpretation - is on aspects such as the disambiguation of queries (e.g. semantic disambiguation), the development of languages for keyword search in modern search applications (including social applications) and the use of feedback from users for improving results.

One of the main achievements in the past year was the organisation of the first edition of the SDSW workshop (Surfacing the Deep and the Social Web) at the International Semantic Web Conference 2014, jointly run with WG4. Our hope is that this workshop will be sustained for a number of years and that the topic will gain increased visibility at future conferences.

Members of WG3 have also recently been involved in research topics such as:

  • Unsupervised topic discovery in social networks: the ITAKA research group at University Rovira i Virgili, Spain produced a PhD dissertation on “Moving Towards the Semantic Web - Enabling New Technologies Through the Semantic Annotation of Social Contents” supervised by Toni Moreno, as well as a publication at ECAI 2014.
  • Medical liver case ontology (LiCO): researchers from the Boğaziçi University of Turkey (Burak Acar, Suzan Uskudarli) and from the University of Malaga, Spain (María del Mar Roldán García, José Aldana) released a beta version of LiCO through the VAVlab website, and a STSM was carried out by María del Mar Roldán at Boğaziçi University. Several papers on the LiCO ontology are planned. The two groups also collaborated on organising a task at imageCLEF 2015.

networking-diagram.jpg

  • As-you-type search in social applications: the University of Paris-Sud (Bogdan Cautis) and Yahoo! Labs Barcelona studied a query model where answers pertain to a query interpretation for which the last term in the query sequence can match keyword prefixes. They have already devised an algorithmic solution for a query model in which relevance is judged by fixed (pre-determined criteria) – the topic of an ongoing paper submission. They also worked on the foundations for a more generic search problem, in which relevance must be inferred (learned) on the fly, during searches, in an adaptive process based on multi-armed bandits.
  • Searching over rich social data, having semantics and structure – the University of Paris-Sud (Ioana Manolescu, Bogdan Cautis) collaborated with University of Rennes 1 on a data model and a preliminary approach for answering queries over structured, social and semantically rich content, taking into account all dimensions of the data in order to return the most meaningful results. This resulted in a publication at SDSW 2014.

What countries are in the KEYSTONE COST Action?

So far, KEYSTONE involves teams of researchers from 28 countries: Austria, Belgium, Bulgaria, Croatia, Cyprus, Estonia, Finland, France, FYR Macedonia, Germany, Greece, Ireland, Israel, Italy, Malta, Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, and the United Kingdom.

One of the goals of KEYSTONE is to create a large network of researchers working on semantic keyword-based search on structured data sources. Teams, researchers, and practitioners interested in themes related to keyword search can join the Action and benefit from the opportunities made available by the COST framework by simply filling in the form available here. People from other countries not included in the list can contact their national COST office (see www.cost.eu for more details) to find out about the procedure to participate in the network. We are open to other teams and countries, and we encourage those interested to join us! You can find out more about why you should join here.

In 2014, our WG1 and WG2 members actively participated in a number of WG meetings including the “Semantic Keyword Search in Big Data” event in March 2014 in Leiden, and “Querying the Semantic Web” in October 2014 in Riva del Garda, Italy (both co-located with MC meetings). In addition, in order to foster discussions on specific WG1 and WG2 topics within the KEYSTONE network and to identify important challenges and research directions in this context, WG2 and WG1 co-organised a dedicated meeting in May 2014 in Hersonissos, Greece, where members of both working groups discussed open research challenges and exchanged ideas for possible collaborations.

 

To increase awareness of WG1- and WG2-related topics within the scientific community, the members of WG1 and WG2 initiated and co-organised the First International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES ’14) during the 11th European Semantic Web Conference (ESWC 2014) in Greece.

As a continuation of this effort, members of WG1 and WG2 collaborated on a follow-up proposal for “PROFILES ’15” that has been accepted for ESWC 2015 in Slovenia. WG1 and WG2 members also initiated a special journal issue on Dataset Profiling and Federated Search for Linked Data that has been accepted for the International Journal on Semantic Web and Information Systems (IJSWIS). Activities related to the organisation of PROFILES ’15 and the IJSWIS special issue will continue into 2015.

WG4 activities

WG4 of the KEYSTONE COST action, concerned with research integration, showcases, benchmarks and evaluations, was involved in the following research activities:

  • The first edition of the SDSW workshop (Surfacing the Deep and the Social Web) was accepted by the renowned International Semantic Web Conference for inclusion in its 2014 edition. This initiative was a joint effort between WG4 and WG3 (concerned with user interaction and keyword query interpretation), to promote cross-pollination of ideas between the two groups, but it was open to all researchers, regardless of their participation in the COST action. This workshop consisted of a section for paper presentations and a section where the authors, audience, and chairs reflected about research challenges and future research avenues. All submitted papers were peer reviewed. In addition, extended and enhanced versions of the two best papers were fast-tracked for publication in the Journal of Data Semantics.
  • A special issue of the TCCI journal (LNCS Transactions on Computational Collective Intelligence) on keyword search and big data is also being prepared, and the call for papers for this issue has recently been published.
  • Members of WG4 have also been involved in the organization of the international RAW Open Data event. As per an EU Directive and US Presidential Executive Order, public administration bodies will be mandated to open their data to businesses and society at large for consultation and reuse. This means that the relevance of research of the KEYSTONE COST Action will be strengthened, as thousands of new datasets will be made available on the Web shortly. All details, videos of the talks, and other resources of this event can be found online.
  • Given the positive response to the organisation of the first SDSW workshop in 2014, the same joint WG4-WG3 team decided to organise SDSW 2015. This time the proposal was submitted to the Extended Semantic Web Conference (ESWC) and has been also accepted. Work is under way.

The KEYSTONE COST Action’s management teams

Management Committee (MC). The MC is in charge of supervising and coordinating the Action and is composed of two representatives of each participating country. The list of KEYSTONE MC members is online.

Executive Scientific Board (ESB). The ESB has the responsibility for planning, managing and executing the activities. It is composed of the following positions:

  • Chair: Francesco Guerra (Italy)
  • Vice-Chair: Jorge Cardoso (Portugal)
  • Scientific Coordinator: Yannis Velegrakis (Italy)
  • Dissemination Coordinator: John Breslin (Ireland), supported by Catarina Ferreira (France)
  • WG1 Co-Leaders: Raquel Trillo Lado (Spain), Stefan Dietze (Germany)
  • WG2 Co-Leaders: Elena Demidova (Germany), Julian Szymanski (Poland)
  • WG3 Co-Leaders: Omar Boucelma (France), Bodgan Cautis (France)
  • WG4 Co-Leaders: Paulo Rupino (Portugal), Ngoc Thanh Nguyen (Poland)
  • Training Coordinator: Charlie Abela (Malta)
  • STSM Coordinator: Abdulhussain Mahdi (Ireland)
  • COST Officer: Giuseppe Lugano

Editorial Board. This is a subset of the MC and is in charge of planning the publication strategy, reviewing the Action publications, deliverables, and the website.

national-flags-with-sky.jpg

Work and budget plan for 2015

€164,000 euro is the budget allocated by the COST Association to the KEYSTONE COST Action

The Work and Budget Plan for the second grant period (from 01/12/2014 to 31/12/2015) has been approved by the Management Committee of the KEYSTONE COST Action and the COST Office. All of the activities (i.e. the research topics explored during our meetings, the other networking and coordination tools promoted by the Action, and the research activities jointly addressed by our WG members) will be built around four main “specifications” of the general theme related to “keyword search in large amounts of data”:

  1. Development of techniques and tools for search and analytics;
  2. Study of scalable techniques;
  3. Application of semantics;
  4. Use of open data.

The work in this field will allow KEYSTONE to continue the research activities begun in 2014 by elaborating on topics related to keyword search in big data. There are also three main events scheduled for 2015:

  • WG Meeting in Kosice, Slovakia on 11-12 May 2015;
  • Training School in Malta on 20-24 July 2015;
  • Open Conference in Coimbra, Portugal during the second week of September.

Collaborations between our WG members is useful (and much needed!) for the realisation of fruitful events. For this reason, a number of calls for organisation and calls for volunteers will be issued shortly.

Encouraging high-school students to use keyword-based search tools

1-1266859371lhIe.jpg

KEYSTONE has collaborated in the organisation of the Anita Borg Week in Spain, in particular in a contest called WikinformáticA! in Aragón which was created to achieve the following main goals:

  • To foster the use of image search tools and tools to create wikis among high-schools students;
  • To encourage female students to undertake degrees related to Information and Communications Technology (ICT); and
  • To promote the history of prominent women in the field of ICT in order to serve as role models for a new generation of students.

In the contest, teams of high-school students were provided with photographs of prominent women in the history of ICT, and were tasked to find out who were the people in those photographs. After discovering who they were, students had to develop a biography about those people in a wiki format. In this first edition of WikinformáticA in Aragón, 95 high-school students took part and were divided into eight teams. Most participants enjoyed the experience very much, and learned how to search for information and create a wiki. The content of the wiki pages they curated will be included in the Spanish Wikipedia over the next few months. Moreover, annotated data will also be incorporated into the Spanish DBpedia.

1702-1252709341CgRp.jpg

White Paper on keyword search in big data

The main goal of this first White Paper from the KEYSTONE COST Action was to present a coherent view of the contributions of participants in a “brain-writing session” that took place at the Action's second meeting in Leiden on 24 March 2014. We performed an analysis of six key aspects of keyword search: challenges in keyword-based search; practical scenarios that could benefit from keyword search research; methods for supporting a user in keyword queries and results analysis; methods for obtaining optimal results from keyword queries; benchmarking environments and evaluation of keyword search; and KEYSTONE's application fields.

In order to achieve rapid results, while ensuring the active involvement of all participants (as well as collecting contributions from each participant's areas of expertise), the methodology that was used to gather insights on the key aspects of keyword search was carried out through a directed brain-writing session. Brain-writing is a method that takes advantage of group priming effects through writing and reading interaction and that reduces traditional brainstorming production blocking due to face-to-face interaction inhibitions. In this method, a participant writes his or her ideas down on a piece of paper, passes them on to a second participant, who reads and develops them further by adding his or her own ideas and comments, and then that second participant passes the paper on to yet another (third) participant. The ideas are passed forward, and thus developed and screened by three different participants, without returning to the original source. The answers to each question were briefly summarised and discussed in a subsequent session of the meeting, to collect general ideas and contributions from all participants. All contributions and comments were further analysed and extended upon by five appointed members, and the results are the ones as presented in this final White Paper.

 

The White Paper is here: http://www.keystone-cost.eu/keystone/wp-content/uploads/2015/01/WhitePaper_first.pdf

The 11 short-term scientific missions (STSMs) funded by KEYSTONE COST in 20141210-12409560184r2o.jpg

Keyword-Based Search Foundations

Prof. Jorge Cardoso (1 week)

From: University of Coimbra, Portugal. To: University of Modena and Reggio Emilia, Italy

Between August and September 2014, Jorge Cardoso from the University of Coimbra, and Francesco Guerra from the University of Modena and Reggio Emilia, were involved in an STSM. The mission took place in Italy. During one week, the two researchers delineated the strategy for the KEYSTONE COST Action, discussed how recent findings could contribute to the development of new keyword-based search mechanisms, and identified which approaches should be explored.


Exploratory- and Keyword- Based Design of Data-Intensive Scientific Workflows

Dr Khalid Belhajjame (10 days)

From: Paris-Dauphine University, France. To: Delft University of Technology, The Netherlands

Thanks to the STSM granted to Khalid Belhajjame from the Paris-Dauphine University, he was able to elaborate together with his host, Alessandro Bozzon from the Delft University of Technology, a new model and architecture for designing keyword-based exploratory data science processes. The visit was successful and fruitful both in terms of the research results obtained, but also in terms of the collaborations that have been built between the visitor and host institutions.


Semantic Classification of Liver Radiological Reports

Dr María del Mar Roldán-García (1 week)

From: University of Malaga, Spain. To: Boğaziçi University Istanbul, Turkey

For a week in June 2014, Dr Roldán visited Boğaziçi University to work with researchers from VAVLab. It was a fruitful visit and the collaborators are still working together. As a result of this STSM, ONLIRA (Ontology of the Liver for Radiology) was classified using a semantic reasoner and was populated with RDF data from the CaReRa project. Furthermore, ONLIRA inconsistencies and redundancies were eliminated. Finally, ONLIRA was extended to include information about liver patients.


Extraction and Representation of Placenames for the Reconstruction of Itineraries from Texts

Mr Ludovic Moncla (1 month)

From: Université de Pau et des Pays de l'Adour, France. To: University of Zaragoza, Spain

Spatial-based queries are one of the most common kind of queries for searching on structured data sources. This project dealt with the extraction and representation of placenames for the reconstruction of itineraries from textual sources, converting these sources into structured data sources with respect to spatial information. During this STSM, the collaborators made a proposal for a map-based algorithm for toponyms disambiguation based on clustering techniques. The technique is also able to infer the location of those toponyms not found in a geographic database thanks to the previously disambiguated toponyms. A paper "Geocoding for Texts with Fine-Grain Toponyms: An Experiment on a Geoparsed Hiking Descriptions Corpus" was written and accepted as a full paper for the ACM SIGSPATIAL 2014 conference.


Keyword Search on Graph Data

Dr Paolo Missier (10 days)

From: Newcastle University, UK. To: University of Modena and Reggio Emilia, Italy

This STSM explored the combination of formal and practical notions of data provenance, with known approaches to the problem of keyword-based search over relational databases (or, more generally, semi-structured data models such as RDF).


Keyword Search on Relational Data in the Deep Web

Dr Andrea Cali (1 week)

From: Birbeck, University of London, UK. To: Roma Tre University, Italy

The topic of the visit was keyword searches on Deep Web (aka Hidden Web) data sources. The Deep Web is constituted of data that is accessible through Web pages, but are not indexable by search engines, being returned in dynamic pages. The researchers developed the notion of keyword search in the context of web data sources accessible through HTML forms, and proposed a preliminary framework for it.


From Recommender Systems to Personalised Academic Search Engines

Mr Stefan Langer (1 month)

From: Otto-von-Guericke University Magdeburg, Germany. To: University of Cyprus, Cyprus

A search engine for academic papers was planned and implemented, using the open source literature management software Docear. Large parts of Docear's existing academic paper recommender system were reused in order to recommend search terms to users. Docear’s academic paper recommender system has been developed over some years, and in a recent evaluation 240,948 recommendations delivered to 4,153 users between April 2013 and June 2014 were analysed.


Context Information Extraction from Social Media Structured Sources

Dr Georgia Kapitsaki (20 days)

From: University of Cyprus, Cyprus. To: Delft University of Technology, Netherlands

Dr Kapitsaki’s activities during the STSM at TU Delft were in relation to the design and implementation of a tool for Web API usage from JavaScript in web pages. The STSM provided the opportunity to start a collaboration with the Web Information Systems research group and it resulted in a first prototype of the webanalyser tool.


vintage-milano-travel-poster.jpg

Improvement of Automatic Formalization Processes for Thesauri

Dr Javier Lacasta (1 month)

From: University of Zaragoza, Spain. To: University of Geneva, Switzerland

The STSM has enabled a sustained collaboration between the IAAA group of the University of Zaragoza and the ICLE in the University of Geneva for the automatic formalisation of thesauri and their conversion into ontologies. It has allowed the participants to evaluate the problem complexity of automatically generating an ontology that can be used to improve information retrieval in keyword/relation-based queries on documents and collections, and the associated classification processes. Based on this, the researchers have defined a system architecture to perform such a task using information extracted from knowledge organisation systems, metadata, text documents and databases.Currently, they are developing parts of such a system that, when finished, can provide high-quality formal models from simple thesauri.


Semantic and Crowdsourced Keyword-Based Search

Dr Marco Brambilla (1 week)

From: Polytechnic University of Milan, Italy. To: Paris-Dauphine University, France

The objective of the visit was to combine semantic modeling and crowdsourcing for the purpose of improving keyword-based search. More concretely, the visit aimed to begin a collaboration on keyword-based search over complex, structured data sources, by combining the expertise of Paris-Dauphine on semantic modeling and querying of structured big data, with the expertise of Dr Brambilla’s own group in Milan on crowdsourcing, media analysis, and structured Web content.

 


Improving Users' Search Behaviour

Dr Claudia Hauff (1 week)

From: Delft University of Technology, Netherlands. To: University of Lugano, Switzerland

In October 2014, Dr Hauff joined the Information Retrieval research group at the University of Lugano for a one-week research visit. They worked together on ideas of how to explicitly train users to use search systems in a better, more effective, manner. The main focus of their research was how to implement such a training phase, how to evaluate the effects the training has, and how long-lasting these effects are.

Report on our 2nd WG meeting “Querying the Semantic Web”

17-18 October 2014

Riva_Del_Garda.jpg

Our 2nd WG Meeting was held in Riva del Garda, Italy, with the University of Trento as local organiser.

Forty-nine researchers (24% female) participated in the meeting.

They came from 22 countries:

  • 8 from Italy, 5 from France, 4 from Spain, 3 from Croatia, 3 from Portugal, 3 from Serbia, 2 from Germany, 2 from Romania, 2 from Poland, 2 from Slovenia, 2 from Greece, 2 from the Netherlands, 2 from the UK, 1 from Bulgaria, 1 from Slovakia, 1 from Switzerland, 1 from Ireland, 1 from Cyprus, 1 from Estonia, 1 from FYR Macedonia, 1 from Sweden, 1 from Finland.

The meeting focused on activities for improving networking among the participants, and promoted joint research in the field of querying the Semantic Web. A particular focus was given to the needs of enterprise.

The first part of the meeting was focused on an analysis of the meeting’s participants. We created a tag cloud of participants’ skills as shown below, by asking WG Members to provide two keywords defining their research activity.

Some brainstorming sessions were organised to identify the main open research issues in the area. The sessions started with six questions about datasets, source access, query methods, evaluation, benchmarks, and competitions. These questions were put to the participants who were then asked to answer some of them in a written manner. The written answers constituted the starting point of discussions that followed.

In some brainstorming sessions afterwards, several participants (Georgia Kapitsaki, Paolo Missier, Andrea Calì, Andre Freitas, Jorge Cardoso, Nicola Ferro and Yannis Velegrakis) were appointed to lead the analysis of the questions and answers. The rest of the participants were divided into small groups. Each group participated in a short discussion (20 minutes) on all of the questions. The results of the activity were presented in a plenary session, and a White Paper around the outcomes is currently under development.

An invited talk was also delivered by Bruno Crispo, Università di Trento, on encrypted search.

Finally, a roundtable was organised to discuss “The Semantic Web, Smart Cities and Enterprise: Opportunities for Research”. We invited representatives from enterprise for this roundtable: Marco Combetto (Informatica Trentina SPA), Paolo Bouquet (Università of Trento and Okkam SRL), Vladimir Alexiev (Ontotext), and Henrik Strindberg (Xcerion). They were asked to answer two main questions:

  1. Are the currently-available Semantic Web technologies mature enough to be applied on real and complex scenarios?
  2. What are the main challenging (real) scenarios that you have encountered?

Involvement in “III Jornadas esDBpedia”

Next KEYSTONE meeting!

DBpedia is a key semantic resource in the Linked Open Data cloud (LOD cloud). DBpedia data primarily comes from Wikipedia (the popular online encyclopedia that grows daily thanks to the contributions of many users), in particular from the structured data embedded within the content of Wikipedia articles. It has also recently begun to include structured data from other sources such as Wikidata or Wiktionary. Until 2011, only data from the English language Wikipedia was extracted, but now DBpedias have been created for different language Wikipedias. For the Spanish language version, esDBpedia was created.

In January 2015, some members of WG1 participated in the “III Jornadas esDBpedia” event to improve the amount and quality of data in esDBpedia, and to find out how semantic data and storage technologies are being used to create applications in different contexts, for example in the Biblioteca Nacional de España.

Čákiho-Dezőfiho_palác.jpg

The next meeting will be held in the University Library at the Technical University of Košice, Slovakia, from 11-12 May 2015.

Calls promoted by KEYSTONE

Springer LNCS Transactions on Computational Collective Intelligence (TCCI): Special Issue on Keyword Search and Big Data


2nd International Workshop on Surfacing the Deep and the Social Web (SDSW 2015), co-located with ESWC 2015 (31 May-4 June 2015)



2nd International Workshop on dataset PROFIling and fEderated Search for linked data (PROFILES ’15), co-located with ESWC 2015 (31 May-4 June 2015)


IGI Global International Journal on Semantic Web and Information Systems (IJSWIS): Special Issue on Dataset Profiling and Federated Search for Linked Data 

Call for industry participation in the KEYSTONE COST Action

KEYSTONE is a COST Action that brings together people from 28 countries and aims at enabling research activity and technology transfer in the area of keyword-based search over structured data sources.

KEYSTONE achieves its goal by organising meetings and training schools, by funding the participation of researchers and practitioners in these events, and by supporting short-term scientific missions.

Why should you consider participating in KEYSTONE?

It is a “place” where you can:

  • find research groups for discussing, sharing, improving, and jointly developing research ideas, and solving, applying, and developing solutions to keyword search challenges in particular domains.
  • provide real scenarios and datasets for which keyword search techniques are required.
  • propose a team or a company for hosting internships.
  • request a grant to (partially) support a visiting activity.
  • participate in research meetings with scientific teams coming from several European countries to prepare research proposals for European and national calls.
  • share, disseminate, and stay informed about the main developments in the field of keyword search over structured data sources.
  • provide and reuse scientific outcomes, including code libraries and datasets.
  • supply knowledge and technology transfer needs.
  • participate as an expert or student in KEYSTONE training schools.
  • organise and participate in workshops, conferences, special issues in journals and industrial events.

What are the requirements for participating in KEYSTONE?

The only requirement is to have an active participation during our meetings. In particular for 2015, we would like to organise sessions during our annual meetings oriented towards industrial experts. The goal is to establish a “bidirectional exchange” of knowledge between industry and academia, to define specific industry needs to address, and to design use cases and scenarios for evaluating the techniques developed by KEYSTONE.

Although the main topic of the action is keyword search in large-scale data, the expertise of the participants spans across all major areas of large-scale data management.

If you are interested in joining KEYSTONE or if you need further details, please send an email to the Chair of the Action: francesco.guerra@unimore.it

glasses-and-pen-on-a-newspaper.jpg

Next newsletter

Please send all items for the Springer/Summer 2015 newsletter to the Dissemination Coordinator: john.breslin@nuigalway.ie @johnbreslin