KEYSTONE COST Action IC1302 Newsletter #5 (2017)
The KEYSTONE COST Action finished on the 14th December 2017 after four years of activity. The principal outcome targeted by the Action was the coordination of collaboration amongst the fields of semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning and natural language processing, to enable research activity and technology transfer in the areas of keyword-based search over structured data sources.
Alongside this main objective, the Action also aimed:
The Scientific Programme of the Action has been divided into three vertical thematic areas, each one covered by a respective working group (these groups are: (WG1) representation of structured data sources; (WG2) keyword search; and (WG3) user interaction and keyword query interpretation), and a horizontal activity across the thematic areas that is covered by a fourth working group (research integration, showcases, benchmarks and evaluations (WG4)).
In the end, the Action had involved 238 Working Group Members (177 male, 76.1% – 56 female, 23.9%). These members were distributed across the working groups as follows. Note that members could choose to participate in more than one group, according to their interests.
The geographical distribution of Working Group Members is not balanced across countries, as illustrated in the graph on the left.
In order to obtain a good idea of the actual topics on which Action participants actually work on, over the last five meetings attendees were asked to provide three keywords that they believed best characterised their work. The collected keywords (688 in total) were used to produce the word cloud illustrated in the following figure.
KEYSTONE organised 11 meetings, with an overall participation of around 500 attendees in total, of which around 350 were financially supported by COST. Of the latter, around 47% of COST-supported participants were from ITCs (Inclusiveness Target Countries).
In these meetings, there were brainstorming sessions where emergent techniques in keyword search research were collected, analysed, and revised.
The sessions were of three types:
Details of the meetings have been reported on the KEYSTONE website:
During the four years of the Action, KEYSTONE was able to fund 64 missions through 11 calls. The KEYSTONE website has published all of the associated reports describing the activities carried out and results achieved:
KEYSTONE has also organised three training schools, which required an investment of around €70k of support funding, and that was made available to different institutions. Details and slides from the training school lectures are publicly available on the website:
KEYSTONE organised a number of dissemination events. In particular, it organized three International KEYSTONE Conferences (IKC2015, IKC2016, and IKC2017).
Furthermore, the Action proposed the organisation of workshops at other international conferences. The PROFILES workshop series (2014-2017) was initiated within the KEYSTONE Action.
Finally, two editions of the SDSW workshop (Surfacing the Deep and the Social Web) were also organised in 2014 and 2015.
The Action has also promoted two special issues, one on Keyword Search and Big Data published in the Springer LNCS Transactions on Computational Collective Intelligence (TCCI) Journal, and the second as a special issue on Dataset Profiling and Federated Search for Linked Data published within the International Journal on Semantic Web and Information Systems (IJSWIS), Volume 12, Issue 3, 2016.
The 3rd KEYSTONE Training School on Keyword Search in Big Linked Data was a research-training event for graduates and postgraduates in the first steps of their academic career. It gave participants in-depth exposure to exciting and fast-developing areas related to Keyword Search in Big Linked Data.
The school was held over five days, between the 21st and 25th August 2017, and consisted of keynote talks, lectures, and hands-on sessions delivered by renowned academics and experts in the fields of Big Data, Linked Data, NLP, Semantic Web, IR, and other related areas. During the sessions the speakers explored a large spectrum of current exciting research, development and innovation related to various research areas and society itself.
The school was organised by IFS (the Information and Software Engineering Group, TU Wien) in collaboration with the KEYSTONE COST Action IC1302.
The third edition of the Keystone Training School on Keyword Search in Big Linked Data consisted of 15 lectures (between 1 hour and 4.5 hours), four of which were industry lectures, presented by two multinational corporations (Siemens and T-Mobile) and two Austrian mid-size enterprises (the Semantic Web Company and Siemens). As such, the training school offered its students a complementary perspective of the field, covering both academic and industry research.
In terms of academic lectures, the school started with an introductory lecture by Dr. Elmar Kiesling (TU Wien). Also on the first day, Dr. Sherif Sakr (KSAU-HS) gave an overview of large-scale computing for semantic and Linked Data.
The keynote talk of the training school was given by Prof. Heiko Paulheim, the Interim Chair for Data Science at the University of Mannheim and the Program Director of the Mannheim Master in Data Science. His talk covered Knowledge Graphs on the Web and KG Applications, Data Quality and Data Cleaning on Knowledge Graphs, Machine Learning and Data Mining on Linked Data (the RapidMiner LOD Extension), and Anomaly Detection. The topic of quality assessment was dealt with in more detail by Dr. Jeremy Debattista in his lecture on “Scalable Linked Big Data Quality Assessment” immediately afterwards on the same day. Then, Dr. Elena Demidova instructed students on the state of the art in information extraction, thus enabling them to generated their own large linked datasets, on which to test quality assessment tools.
On the third day of the school, Prof. Antonio Farina, Dr. Javier D. Fernandez, and Dr. Miguel A. Martinez-Prieto together presented the longest lecture of the training school, detailing how to manage compressed Linked Data, in order to handle large amounts of data while preserving storage resources and increasing efficiency.
In addition to topics covering general Linked Data, the school also had two lectures on domain-specific Linked Data: one on medical data (presented by Dr. Mauro Dragoni) and one on geographical data (by Dr. José R.R. Viqueira).
The last day of the school was primarily focused on implicit semantics. Dr. Vagan Terziyan presented an overview of deep learning for cognitive computing, while Mr. Navid Rekabsaz went into specific details on text embedding.
The last day of the school was dedicated to a hackathon, organised by Dr. Dragoni and locally managed by Mihai Lupu.
The school hosted a number of international and local attendees, as shown below:
Of these, 21 international attendees received a grant from the KEYSTONE COST Action. All were ESRs (early stage researchers), 11 male and 10 female. For the grants, 38 applications were received. The table below shows the distribution per gender and country.
The Training School organised a social event on the first day of the school, offering, in cooperation with the Romanian Cultural Institute in Vienna, a piano concert, followed by a wine reception in their venue in the Argentinierstrasse, in the fourth district of Vienna.
The school also offered an innovative online training tool, provided by KnowledgeFox, and $100 Microsoft Azure accounts for all students and lecturers.
TU Wien Faculty of Informatics, Microsoft, KnowledgeFox and COST.
The third edition of the International KEYSTONE Conference (IKC2017) was chaired by Yannis Velegrakis (University of Trento, IT) and Julian Szymanski (Gdansk University of Technology, PL), and was supported by local organiser Tomasz Boinski at the Gdansk University of Technology.
18 papers were accepted, of which 13 were full length papers, and 5 were short papers. The conference website (http://www.keystone-cost.eu/ikc2017/) lists the titles, authors, abstracts and slides from all accepted and presented papers. The papers all went through a review process with three independent reviewers. The names of the reviewers were known to the program chairs but were kept from the authors.
The conference featured two keynote talks. The first was research-oriented and on the topic of “Data Cleaning in the Big Data Era”. It was delivered by Paolo Papotti, an Assistant Professor at Eurecom’s Data Science Department, France. The second was industry-oriented and on “The Challenges of a Data Entrepreneur in Commercial Machine Learning Projects”. It was delivered by Jacek Kawalec from Voicelab, Poland. The conference concluded with a brainstorming panel session discussing ideas for how a COST Action can be further improved.
KEYSTONE supported the participation of 36 people in the meeting (from 21 different countries, with 36% female). Spain and Serbia were the largest participants with 4 people each, followed by Croatia and Malta with 3 people.
This meeting was organised with the aim of improving the dissemination of KEYSTONE Action results to industry. The event took place on the 6th November 2017. The organiser of the meeting was Dr. John Breslin, from NUI Galway. With over 17,000 students and more than 2,400 staff, NUI Galway is the largest and oldest University in the West of Ireland. Over the past 170 years, it has built a distinguished reputation for teaching and research excellence in the fields of arts, social science, and celtic studies; business, public policy and law; engineering and informatics; medicine, nursing and health sciences; and science.
The meeting was divided into three parts. In the morning, a small number of KEYSTONE members (selected through a call issued via the Action newsletter) presented the results of their research.
These presentations were followed by a visit to the PorterShed (www.portershed.com). The PorterShed is an area hosting companies and startups providing all of the technological infrastructure needed by high-tech companies. The PorterShed’s goal is to create an innovation ecosystem, one that creates a synergistic relationship between people, companies, and place, that facilitates idea generation, open learning, collaboration and accelerates commercialisation.
In the second part of the day, three local companies (Orreco, Derilinx, Micro Focus) presented their research activities. The meeting concluded with an Industry-Academia Panel highlighting possible connections with research carried out in KEYSTONE.
From: Faculty of Mining and Geology, University of Belgrade (RS)
To: L3S Research Center, Leibniz Universität Hannover (DE)
Started on 08/01/2017
Finished on 22/01/2017
“This STSM was focused on the use of LRMI terms on the Web by assessing LRMI-based statements to improve keyword-based search of OERs and increase their visibility. A metadata annotation for the West Balkan OER portal BAKTEL was later implemented as an API based on the schema.org vocabulary and published in the eLearning 2017 Conference. During this STSM, the study of data extracted from embedded annotations, utilising the Web Data Commons as the largest crawl of embedded markup, was investigated for the level of adoption of terms and types, the shape and characteristics of entity descriptions, and the distribution of data across the Web, specifically for the West Balkan country domains .rs, hr, .ba. I learned about other interesting developments at the L3S Research Center from the Leibniz Universität, such as new tools and techniques for large content repositories and digital libraries.”
From: Epoka University, Tirane (AL)
To: University of Zaragoza, Zaragoza (ES)
Started on 27/02/2017
Finished on 20/03/2017
“The scope of this STSM was to improve (in terms of performance) the core algorithm of Keymantic, a keyword-based search engine over relational data. During the two weeks’ activity, the feasibility of the use of symmetric groups from group theory was analysed. In particular, different variants of symmetric group representation were studied that might better fit the generation of permutations (using a generator function) that represent solutions in monotonic descending order. Such permutations would represent the top-k ranked results list of the search engine.”
From: University of Trento, Trento (IT)
To: University of Paris Sud, Orsay (FR)
Started on 07/03/2017
Finished on 28/03/2017
“In this specific STSM two different kinds of work were initiated. The first one deals with the problem of retrieving balanced results in keyword querying. In particular, the data is assumed to be user generated content from social media that characterises the users that have produced it. A keyword query is treated as a topic description. Given a query, the goal is the identification of a finite set of users in such a way that the topic of all the items that these users have produced is not biased with respect to the topic the query describes. Furthermore, the opinion of these users as described by the items they have produced is as diverse as possible between them. The second work is related to a novel approach for entity linkage. Each entity is not seen as a set of attributes but as a set of thematic units, each containing a set of attributes. Matching between the entities is performed within each thematic unit, and then combined to calculate the overall similarity. This approach to entity linkage achieves better results when used for entity identification in highly heterogeneous environments.”
From: L3S Research Center, Hannover (DE)
To: ISST Laboratory, ITMO University, St. Petersburg (RU)
Started on 20/03/2017
Finished on 24/03/2017
“The goal of this STSM was to explore the options for joint research projects between the L3S Research Center (Germany) and the ISST Laboratory (Russia). During the visit, several research directions in the area of multilingual information extraction for German, English and Russian languages have been discussed. Application areas of interest for both institutions in this context include web archives, the financial domain, and interlingual question answering.”
From: Institute for Bulgarian Language, Bulgarian Academy of Science, Sofia (BG)
To: Faculty of Mining and Geology, University of Belgrade, Belgrade (RS)
Started on 26/03/2017
Finished on 2/04/2017
“The purpose of the STSM was to perform a collaboration between the University of Belgrade and the Bulgarian Academy of Science on the topic of `Natural Language Processing Keyword Search for Related Languages’. It aimed to test the hypothesis that related languages get similar results when searching massive amounts of lexical data to extract semantic relations using language-specific keyword search. The approach used a combination of IR, semantics, language-specific tags and expanded search, so as to deal with more complex semantic query representations. The procedure was to test the approach performing keyword search over structured data – multilingual electronic text corpora in the domain of mathematics. For that purpose, during the time of the STSM, the first comparable Bulgarian–Serbian/Serbian–Bulgarian electronic text corpora (MathWikiBG and MathWikiSR) were created incorporating mathematical texts in Bulgarian and in the Serbian language. The Sketch Engine software system was used to run the search experiments. The aim was to extract semantically related words and/or their internal semantic (hidden) relations using statistically based techniques to search and retrieve similar words, assuming that semantic similarity can be evaluated by statistical measurement. The received keyword search results improved our hypothesis that using different types of expanded search, it is possible to capture a keyword's constraints (lexical, grammatical, syntactic or semantic) which govern word combination selections in related semantic contexts for related languages. The results were disseminated through the publication of two articles and via participation in the KEYSTONE COST IC1302 Industry-Academia Expert Workshop at NUI Galway (IE).”
From: University of Cyprus (Department of Computer Science), Aglantzia, Nicosia (CY)
To: University of Novi Sad, Faculty of Technical Sciences, Novi Sad (RS)
Started on 01/04/2017
Finished on 14/04/2017
“Context-aware services are relevant in diverse domains. The purpose of my STSM was to focus on scientific documents, e.g., dissertations, in order to improve the recommendation of relevant dissertations enriching the user data with context information. During the STSM we worked on the representation of user search results with word clouds and adaptation to fit the preferences of each user. It was a great experience for discussing how we can bring keyword-based search and context awareness together on different levels.”
From: University Paris-Dauphine, Paris (FR)
To: Manchester Metropolitan University, Manchester (UK)
Started on 06/04/2017
Finished on 13/04/2017
“While the curation of scientific workflows has received attention in the past decade or so, scientists still find it difficult to understand, run and ultimately reuse existing workflows in their analyses. We have examined during this STSM means for leveraging external source of information, notably textual documentation, to describe scientific workflows and improve their discoverability. We have conducted feasibility studies for our ideas using cases from the genome-wide association studies domain.”
From: Université Libre de Bruxelles, Brussels (BE)
To: Birkbeck University of London, London (UK)
Started on 10/04/2017
Finished on 20/04/2017
“This STSM provided the means to start a very fruitful collaboration. I have visited Prof. Calí twice since the STSM, we have published two articles together, and we are currently working on a third one. Thanks to the support of the Action we were able to dedicate a complete week to discuss points of collaboration in our research. This would not have been possible under other circumstances, the conditions of the STSM were excellent and the required paperwork was minimal. Overall the experience of participating in an STSM was very good and the results have been even better than expected.”
From: Aix Marseille Université, Marseille (FR)
To: University of Modena and Reggio Emilia, Modena (IT)
Started on 05/04/2017
Finished on 23/04/2017
“STSM is an interesting support tool that helps researchers to focus on a specific topic with, in most cases, an easily measurable deliverable such as a publication. During the KEYSTONE COST Action, I had the opportunity to visit UNIMORE. Besides interactions with the STSM host, the work carried out during this STSM allowed us to involve two postdoc students (one from Italy and one from France). The ideas discussed in previous MC and WG meetings have been quickly implemented, resulting in a publication.”
From: St. Kliment Ohridski University, Biota (MK)
To: National and Kapodistrian University of Athens, Athens (EL)
Started on 30/06/2017
Finished on 18/07/2017
“The purpose of the STSM was to enable me (Mr. Nikolche Spasevski) to visit Professor Ioannis Doxiadis at the National and Kapodistrian University of Athens in order to discuss experiences and to extend knowledge in the area of keyword search, more specifically in the area of keyword search in big graphs and team formation in social networks. During the STSM, I spent three weeks at the Computer Systems Department of the National and Kapodistrian University of Athens, under the supervision of Professor Doxiadis. The work carried out during my visit has been mainly focused on keyword search over graph-like databases, which offers an alternative way to access and use semi-structured data that neither requires mastery of a query language, nor deep knowledge of the database’s potentially quite complex schema. We consider this STSM to be very successful, and we estimate that the project will take several months to be completed. Since I have opened the door for future collaborations with the group of Professor Doxiadis, we plan to collaborate with the host institution in the near future, in order to achieve the future objectives of our work.”
From: Università della Svizzera italiana, Lugano (CH)
To: Toulouse Institute of Computer Science Research (IRIT), Toulouse (FR)
Started on 07/07/2017
Finished on 13/07/2017
“The purpose of this STSM was to kick-start a research collaboration between two research groups in Lugano and Toulouse focusing on improving users’ experience with keyword-based search systems in mobile devices. During this STSM, the two groups managed to come up with a roadmap for conducting relevant user studies while considering various types of tasks in different contexts. The two groups hope that, in the future, they can build upon the outcome of this research visit and investigate various aspects of keyword-based search in mobile devices.”
From: TU Wien, Vienna (AT)
To: ETH Zürich - Data Analytics Lab, Swiss Federal Institute of Technology, Zurich (CH)
Started on 03/09/2017
Finished on 16/09/2017
“Through this STSM, I earned the opportunity to visit the data analytics lab of the ETH university, to get to know the lab’s researchers, and to work closely with them, specially with Dr. Carsten Eickhoff. The focus of our work was to provide novel word embedding through learning which contributes some of the important factors for information retrieval in terms of the representation, namely document relevance and document-context similarity. During the mission, we managed to discuss the idea in detail, study related work, design a set of experiments, and finally achieve some primitive results. Indeed, the visit provided the possibility to effectively communicate and work, as well as fostering relations between the labs and researchers."
From: LIRMM, Montpellier (FR)
To: L3S Research Center, Hannover (DE)
Started on 04/09/2017
Finished on 08/09/2017
“My STSM visit to L3S in Hanover in the framework of the COST Action IC1302 follows the line of a long-term collaboration with the L3S group led by Stefan Dietze that has resulted in a series of common publications in top ranked Semantic Web conferences and a journal. The one-week visit had as a goal to provide the possibility to prepare a proposal for a joint call of the French National Research Agency (ANR) and the German Research Foundation (DFG). I was very well received by my hosts and we managed to successfully accomplish the objectives of the visit – two months later, we are ready with the first draft of our joint French-German proposal on the topic of bias and controversiality detection on the Web. The STSM has, therefore, provided the necessary framework for the continuation of our long term collaboration.”
From: School of Computing Science, University of Glasgow, Glasgow (UK)
To: Faculty of Engineering, Mathematics and Computer Science, Delft University of Technology, Delft (NL)
Started on 24/09/2017
Finished on 30/09/2017
“As part of the KEYSTONE project, Dr. Azzopardi from the University of Strathclyde undertook a short term visit to work with Prof. Claudia Hauff from the Technical University of Delft and Prof. Djoerd Hiemstra from the University of Twente. The visit focused on keyword search in the context of how learners, e.g. university students, search for material when learning. They developed a series of online experiments, which explored how different interventions designed to help encourage learners to enter longer and more descriptive queries influenced and affect search behaviour and performance. Their collaboration is ongoing and has resulted in two publications (SIGIR 2017 and DBIR 2017, on Leading People to Longer Queries in Site Search) thanks to KEYSTONE.”
The KEYSTONE COST Action was completed in December 2017. The Management Committee Members were very proud and satisfied with the results conceived during the Action. From a networking perspective, the Action organised a large number of events (meetings, training schools, and calls for short-term scientific missions), and a significant number of people were able to participate in these activities. Moreover, new research teams, new connections and new projects were started thanks to the Action.
From a scientific perspective, the Action was able to organise a large number of dissemination events, and Members were able to publish in important venues. What is interesting to note is that many networks that were created during the Action have become long-term collaborations beyond the end of the Action, which is a strong indication of its success.
We would like to thank COST for their support throughout.