Spring/Summer 2014 Newsletter

www.keystone-cost.eu

@keystone_cost

What is KEYSTONE?

A new European COST Action that aims to enable more internet users to easily search for information currently stored in structured data sources

internet-and-multimedia-sharing.jpg

A new European “COST” project, entitled "KEYSTONE - semantic keyword-based search on structured data sources" is aiming to make it more straightforward to search through structured data sources like databases using the keyword-based search familiar to many internet users.

The project is financed under an intergovernmental framework for European cooperation, known as COST (European Cooperation in Science and Technology), and is being led by the University of Modena and Reggio Emilia in Italy.

The scientific objective of KEYSTONE is to analyse, design, develop and evaluate techniques to enable keyword-based search over large amounts of structured data.

“This is a problem which will have a major impact at both the scientific and socio-economic level”, says Prof. Francesco Guerra, Professor of Computer Science at the University of Modena and Reggio Emilia, who is the chair of the KEYSTONE project.

 

“The main obstacle is the fact that so far, search within databases is primarily carried out through queries in a format that has been designed for computers, which must comply with a certain formal syntax that is not designed for most computer users.

“Having to know the format that these queries need to take limits searches on structured data to those who know this syntax, and takes it out of the hands of most internet users.”

According to Dr. John Breslin, Dissemination Chair for KEYSTONE. “We wish to empower users so that they will be able to easily search for information from more of the structured data sources that are out there, by using the keyword-based input mechanisms they are used to from popular search engines on the Web.”

The project kicked off on 15 October 2013 and has a duration of four years.


Recent countries joining the KEYSTONE COST action

So far, KEYSTONE involves teams of researchers from 27 countries (Belgium, Bulgaria, Croatia, Cyprus, Estonia, Finland, France, FYR Macedonia, Germany, Greece, Ireland, Israel, Italy, Malta, Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom), and more countries may join later.

One of the goals of KEYSTONE is to create a large network of researchers working on semantic keyword-based search on structured data sources. Teams, researchers, and practitioners interested in themes related to keyword search can join the Action and benefit from the opportunities made available by

the COST framework by simply filling in the form available at this address:

www.keystone-cost.eu/keystone/how-to/join-keystone

People from other countries not included in the list can contact their national COST office (see www.cost.eu for more details) to find out about the procedure to participate in the network. We are open to other teams and countries, and we encourage those interested to join us! You can find out more about why you should join here:

http://bit.ly/whykeystone

Short term scientific missions have launched!

Eight STSMs have been approved, some started in June 2014

Call for papers: Surfacing the Deep and Social Web

Workshop at ISWC 2014; deadline July 7

Apollo_15_launch.jpg

Eight new short term scientific missions (STSMs) were recently approved by the KEYSTONE Action.

STSMs allow researchers to spend from 5 to 90 days at a host institution, working on a research topic of relevance to KEYSTONE.

ESR (Early Stage Researchers with PhDs and up to eight years of research experience) can stay for between 91 and 180 days at the host institution.

There are ten countries participating in this round of STSMs, either as the origin of participants (six countries) or as hosts (seven countries). Some of the projects started in June 2014 and most will have been completed by October.

A second round of STSMs will be announced shortly.

The simplicity with which users can publish content at present has made the Web the world’s largest database. Keyword-based search has become the de-facto standard for information discovery in this ocean of data, mainly due to its simplicity which makes it attractive to novice users of the Web. To answer keyword queries, existing search engines rely on effective indexes of the content that allows them to return the documents that best match the user’s search criteria. This generally leaves out the structure of the data, its semantic dimension, as well as the social aspects to which it may relate.

We believe that, in order to exploit the full potential of the Web, structured and rich data will have to receive the same search and retrieve capabilities as the text data from Web documents. However, due to their highly structured nature, the rich semantics, and the data structures by which they are typically managed, a great deal of issues needs to be studied. As the problem is in general of a multifaceted nature, it requires synergies from many different disciplines. Broad areas of interest for the SDSW workshop include, but are not limited to:

  • Semantic Web, semantic similarity, semantic data management, semantic disambiguation, semantic indexing;
  • Social Web and social media;
  • Interface design, user interaction, natural language processing;
  • Data cleansing, data fusion, data quality, data integration, probabilistic data matching;
  • Benchmarking, ranking;
  • Artificial intelligence, machine learning;
  • Metadata management, schema matching;
  • Deep Web, provenance, information retrieval, database summarisation
  • Context-aware web applications;
  • Knowledge / ontology self-management and evolution;
  • Semantic Web and emotional intelligence;
  • Enabling everything-as-a-user.

Best papers will be nominated for publication in the Journal of Data Semantics.

More details available at sdsw-at-iswc2014.ipn.pt 

PROFILES14 workshop at ESWC 2014

WG1 and WG2 meeting in Crete

Working Group 1 looks at the representation of structured data sources, and Working Group 2 focuses on keyword search

The 1st International Workshop on Dataset PROFIling & fEderated Search for Linked Data (PROFILES14), was co-organised by the KEYSTONE COST Action during the 11th European Semantic Web Conference (ESWC 2014) held in Crete, Greece on 26 May 2014. The PROFILES14 workshop aimed to gather innovative search approaches for large-scale, distributed and heterogeneous linked datasets along with dedicated approaches to analyse, describe and discover endpoints, as an inherent task of query distribution. PROFILES14 considered both novel scientific methods and techniques for querying, assessment, profiling, and curation of distributed datasets as well as the application perspective, such as the innovative use of tools and methods for providing structured knowledge about distributed datasets, their evolution, and fundamentally, means to search and query the Web of Data.

PROFILES14 received a strong set of very relevant and original submissions, six of which were accepted as full papers and three as poster papers. These papers covered a range of topics related to data source contextualisation for search and exploration in Linked Data, profiling of linked datasets, as well as measuring and modelling dynamics and the evolution of Linked Data. The paper "Entity-Based Data Source Contextualisation for Searching the Web of Data" by Andreas Wagner, Peter Haase, Achim Rettinger and Holger Lamm was selected for the "Best of Workshops" session in the main program track of ESWC 2014. The Best Paper Award was given to the paper “LODOP – Multi-Query Optimisation for Linked Data Profiling Queries” by Benedikt Forchhammer, Anja Jentzsch and Felix Naumann.

Our sincere appreciation of his time and expertise goes to keynote speaker Dr Thanassis Tiropanis for his talk entitled “Linked Data Affordances and Challenges for Web Observatories”. The proceedings of PROFILES14 have been published through the CEUR workshop proceedings service, and available at: http://ceur-ws.org/Vol-1151

A meeting of Working Group 1 and Working Group 2 was held in Hersonissos, Crete on 25 May 2014. The organisers of the meeting received and accepted 11 applications. The meeting included16 participants from eight countries (France, Germany, Greece, Italy, Poland, Serbia, Spain and Brazil).

The meeting included two excellent keynote talks: the first talk on "Web Observatory Architectures" was delivered by Dr Thanassis Tiropanis from The Web and Internet Science Group, University of Southampton, UK. In this talk, Dr Tiropanis introduced Web Observatories as global distributed resources that can engage communities with analytics of big datasets including those of the linked data cloud, social media, online archives and media archives. The speaker highlighted scalability and standardisation requirements in this area.

image.php7b00254e41.jpg

Dr Thanassis Tiropanis (L) and Dr Stefan Dietze (R)

The second talk entitled “From Data to Knowledge – Profiling and Interlinking Datasets on the Web” was given by Dr Stefan Dietze from the L3S Research Centre, Leibniz University of Hannover, Germany. In this talk Dr Dietze emphasised the importance of scalable and efficient dataset profiling techniques on the Web of Data to facilitate finding, adopting and reusing data across the Web, and he also gave an overview of ongoing research in this area.

During the meeting, the participants discussed research topics related to semantic keyword search in structured data and scenarios of interactions for the improvement of keyword-based retrieval as well as further collaboration opportunities in these areas.

Report from the first WG meeting in Leiden

http://bit.ly/keystoneleiden

KEYSTONE's activities began in earnest with our spring working group (WG) meeting, held in Leiden from 24-25 March 2014. This was the first meeting involving WG members and was effectively the kickoff event for the research, networking and coordination activities promoted by the Action. 67 researchers (16% female) from 24 countries participated in the event. Spain was the country with the largest number of participants (9), followed by the Netherlands (8), Portugal (6), Germany (5) and France (4). There were also attendees from Belgium, Bulgaria, Croatia, Cyprus, Estonia, Finland, FYR Macedonia, Ireland, Israel, Italy, Malta, Norway, Poland, Romania, Serbia, Sweden, Switzerland, Turkey, and the United Kingdom. The meeting was organised thanks to local support provided by the Naturalis Biodiversity Centre.

Research Task. The topic "Semantic Keyword Search in Big Data" concerns a theme close to KEYSTONE: large-scale data sources are usually comprised of a very large schema and billions of instances. Keyword search over such datasets can suffer from query ambiguity and scalability challenges. The discovery of suitable, i.e. semantically-related data sources, is another critical issue, hindered by the lack of sufficient information on available datasets and endpoints. Browsing and searching data on such a scale is not an easy task for users. Semantic search leverages semantics to improve the accuracy and recall of search mechanisms. Whereas state-of-the-art keyword search techniques work well for small or medium-sized databases in a particular domain, many of them fail to scale on heterogeneous databases that are composed of several thousand tables. In this scenario, the goal addressed during the meeting was the definition of open challenges in keyword search over structured databases, with particular reference to large and heterogeneous sources. This goal was achieved by means of three activities: keynote presentations, a panel, and a brain-writing session. Three keynote speakers were invited:

  • Edgar Meij (Yahoo! Research, ES) gave a talk about “Web-scale semantic search at Yahoo
  • Pedro Furtado (University of Coimbra, PT) delivered a talk on “Scalability and Real-Time for Big Data”
  • Djoerd Hiemstra (University of Twente, NL) spoke about “Federated Search for Real: Combining 150 Search Engines, and Counting”

Photo on 24-03-2014 at 15.36.jpg

Attendees at the KEYSTONE meeting in Leiden in March 2014

A panel about the evaluation of keyword search systems was chaired by Nicola Ferro (University of Padua, IT) and involved as panelists Maarten de Rijke, (University of Amsterdam,NL), Claudia Hauff  (TU Delft, NL), Martin Theobald (University of Antwerp, BE), and Arjen de Vries (CWI Amsterdam, NL).

Finally, during the brain-writing session, all attendees actively participated in the definition of the most important open research issues for semantic keyword search in big data. This session was divided in two parts: during the first, all participants were provided with a question randomly selected from five previously prepared by the ESB members. After 15 minutes, the attendees passed their answer to another participant who had five minutes to revise/augment the answer. This process was repeated twice. After this first session, we collected more than 10 different answers for each question, with each answer revised twice. In the second session, the answers were analysed and discussed amongst all the participants. The final goal of this activity is to create a white paper (Raquel Amaro, PT, was appointed as editor of the paper) to be published online and presented at the next meeting.

Networking task. KEYSTONE is a young Action and most of the participants do not know each other. A session in the meeting was devoted to the presentation of research groups and their main research topics.

Coordination task. One of the goals of the Action is to promote new research activities and new cross-country research groups. We believe that the development of applications for H2020 calls can promote new research activities even if a proposal is not approved by the Commission. People, in the preparation of the application, are facilitated to work together towards a goal. For this reason, we discussed in one session the H2020 calls relevant to KEYSTONE research topic. Bert van Werkhoven, NL Agency contact for the ICT program in H2020, and Kimmo Rossi from DG CONNECT in the EC provided attendees with some suggestions about H2020 opportunities.

Next KEYSTONE meeting

slide-image-1.jpg

  • 17-18 October (to be confirmed)
  • Riva del Garda, Trento, Italy
  • Autumn MC and WG meetings: "Querying the Semantic Web"

In the last decade, the Semantic Web, and in particular the Linked Data Web, has been evolving. There is currently a large number of annotated datasets that are available for use and reuse. As structured data on the Web is generated automatically from different sources, datasets vary widely with respect to quantity, quality, currentness and completeness.

Whereas state-of-the-art keyword search techniques for structured data typically address single-source search scenarios, they neither require any query routing techniques, nor knowledge of source quality.

During this meeting, invited speakers will present current state of the art in the area. A brainstorming session will aim at identifying synergies and collaboration potential among KEYSTONE partners. It is envisaged that joint research papers and proposals will also be drafted at this event.

Related events

Organisation of the KEYSTONE COST Action

The management and the organisational strategy of KEYSTONE are defined by the following frameworks:

Management Committee (MC).  The MC is in charge of supervising and coordinating the Action and is composed of two representatives of each participating Country. The list of KEYSTONE MC members is published at www.keystone-cost.eu/keystone/mc-members

Executive Scientific Board (ESB).  The ESB has the responsibility for planning, managing and executing the activities. It is composed of the following positions:

  • Chair: Francesco Guerra (IT)
  • Vice-Chair: Jorge Cardoso (PT)
  • Scientific Coordinator: Yannis Velegrakis (IT)
  • Dissemination Coordinator: John Breslin (IE) (supported by Catarina Ferreira (FR) and Maciej Dabrowski (IE))
  • WG1 Co-Leaders: Raquel Trillo (ES), Stefan Dietze (DE)
  • WG2 Co-Leaders: Elena Demidova (DE), Julian Szymanski (PL)
  • WG3 Co-Leaders: Omar Boucelma (FR),  Bodgan Cautis (FR)
  • WG4 Co-Leaders: Paulo Rupino (PT), Ngoc Thanh Nguyen (PL)
  • Training Coordinator: Charlie Abela (MT)
  • STSM Coordinator: Abdulhussain Mahdi (IE)
  • COST Officer: Giuseppe Lugano

Editorial Board. It is composed of a subset of the MC members and is in charge of planning the publication strategy, reviewing the Action publications and deliverables, and defining the website contents.

Next newsletter

Please send all items for the Autumn/Winter 2014 newsletter to the Dissemination Coordinator: john.breslin@nuigalway.ie @johnbreslin