SIKS-workshop "Data Science from Different Angles" at Eindhoven University of Technology June 13 (Please register at the bottom of the page)
Link to the map --

Please register below so we would know how many participates to expect.

Program of the day:

Morning Session (1):
Chair: Paul De Bra
9.30 -- 10.00         Wil van der Aalst ( on
"You can only find what you are looking for: On the representational bias in process mining"        
                                Location: Black Box, Zwarte Doos
Processes are everywhere. Organizations have business processes to manufacture products, provide services, purchase goods, handle  applications, etc. Also in our daily lives we are involved in a variety of processes, for example when we use our car or when we book a trip via the Internet. Although such operational processes are omnipresent, they are at the same time intangible. Unlike a product or a piece of data, processes are less concrete because of their dynamic nature. It is not easy to select a suitable process representation for process discovery (one of the challenges in the larger process mining discipline). Discovery processes may have deadlocks or are unable to handle multiple instances. This talk will focus on two particular representations: process trees and object-centric behavioral constraint models.
10.00 -- 10.30 Alexander Tuzhilin ( on
"Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps"
                                Location: Black Box, Zwarte Doos
A new content-based method of providing recommendations of remedial learning materials to the students will be presented. This method identifies gaps in students' knowledge of the subject matter in the online
courses that they take and provides recommendations of relevant targeted educational materials from the library of assembled learning materials to them in order to close the "gaps" in what the students have learned in the
course. The proposed recommendation method is empirically validated using a randomized controlled experiment on the students from an online university. It is shown that the students not only liked the recommendations provided to them by the proposed method, but that these recommendations led to better performance results on the final exams for certain segments of the student body.

10.30 -- 11.00 Coffee Break

Morning Session (2):
Chair: Dong Nguyen
11.00 -- 11.30 Djoerd Hiemstra ( on
"Federated Search: From research to practice"
                                Location: Black Box, Zwarte Doos
Federated search has the potential of improving web search: the user  becomes less dependent on a single search provider and parts of the  so-called deep web become available through a unified interface, leading  to a wider variety in the retrieved search results. I will present the  lessons that we learned from running the Federated Web Search track of the Text Retrieval Conference (TREC), cherry picking from the results  from the best participating systems of the participating research  groups. I will conclude the talk by discussing our steps to take this  work to full practice by running the University of Twente's search  engine as a federation of more than 30 smaller search engines, including  local databases with courses, publications, and telephone numbers, and  results from social media like Twitter and YouTube. The search engine is  available at:

11.30 -- 12.00 Arjen P. De Vries (
"Reproducibility versus Representativeness in Evaluation"
                                Location: Black Box, Zwarte Doos
The talk presents an analysis of issues with respect to reproducibility and representativeness, taking theTREC Contextual Suggestion test collections as a case study. One observation from the results of systems participating in the TREC Contextual Suggestion  track is that systems relying on services that return results from the open Web, using special-purpose APIs and/or commercial web search engines, consistently  achieve better recommendations than those that opted to use the static Clueweb 2012  crawl. Only the latter results ensure reproducibility of research, as their results of
evaluation can be easily regenerated under similar, or the same, conditions. Given the observed performance difference, however, are these results representative  of what we would expect from actual web search services created for this purpose?   Considering representativeness as the extent to which the techniques developed by teams  participating in the TREC Contextual Suggestion Track would hold in actual web search  services, can we somehow increase the representativeness of reproducible results?

12.00 -- 12.30 Maarten de Rijke ( on
"Mixed initiative search"
                                Location: Black Box, Zwarte Doos
As interactions with search engines morph into conversations, search becomes a mixed initiative scenario, where the search engine does not just answer questions but explores options and even generates questions to help a searcher improve their effectiveness. In the talk I will discuss recent advances in exploratory behavior of search engines, with examples in news search, web search and people search.

12.30 -- 14.00 Lunch (on your own)

14.00 -- 15.30 Julia's defense: "Using Contextual Information to Understand Searching and Browsing Behavior"
                               Location: Collegezaal 4, Auditorium
Modern search still relies on the query-response paradigm, which is characterized by a sharp contrast between the richness of data in the index, and the relative poverty of information in the query, usually expressed in a few keywords to capture a complex need. This is particularly true in online search services, where the same query may be observed from many users, with considerable variations in their search intents. Contextual information is the obvious route to try to restore the balance, and behavioral data related to user's searching and browsing activities provides new opportunities to model contextual aspects of user needs. The importance of contextual information in search applications has been recognized by researchers and practitioners in many disciplines, including recommendation systems, information retrieval, ubiquitous and mobile computing, and marketing. Context-aware systems adapt to users’ operations and thus aim at improving the usability and effectiveness by taking context into account. In this thesis, we consider two types of behavior: searching, when users are issuing queries and we are trying to improve search engine results page by taking context of sessions into account, and browsing, when users are surfing a website and we are predicting their movements using context.  Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and use contextual information in order to understand and improve users' searching and browsing behavior on the web.

15.30 -- 16.30 Reception
                                Location: MetaForum 4th floor

Afternoon Session:
Chair: Jaap Kamps
16.30 -- 17.30 Charles L.A. Clarke (  on
"Classifier Cascades for Optimizing Effectiveness-Efficiency Tradeoffs in Multi-Stage Ranking"
                                Location: MetaForum 6.202 (know also as MF 14) at 6th floor
I will examine effectiveness/efficiency tradeoffs in modern multi-stage ranking architectures comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not beparticularly sensitive to the quality of initial input documents, especially in terms of early precision. This provides an opportunity to increase retrieval efficiency without significantly sacrificing effectiveness. Previous work exploring this tradeoff mostly focus on global parameter settings that apply to all queries, even though optimal settings vary across queries. In contrast, I will present a technique for optimizing query evaluation efficiency within a effectiveness envelope on a "per query" basis, using only static pre-retrieval features. The query-specific tradeoff point between effectiveness and efficiency is decided by a classifier cascade that weighs possible efficiency gains against effectiveness losses at a range of possible parameter settings to arrive at a optimal decision. I propose an approach to training these classifier cascades that does not require any editorial relevance judgments. Our general framework is applied to two specific retrieval techniques, the setting of k in top-k retrieval using the document-at-a-time WAND algorithm and the setting of a quality threshold in recently-developed score-at-a-time approximate query evaluation algorithm. Experimental results improvements of 20% or more in average efficiency without any significant loss in effectiveness.
Sign in to Google to save your progress. Learn more
Your Full Name *
Your Email Address *
I'll attend the following: *
Clear form
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy