| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Monday 21.10. at 10.30–11.30. Room: Tekla Hultin (F3003). Chair: Lili Aunimo | |||||||||||||||||||||||||
2 | # | Presenter | Affiliation (presenter) | Title | Authors | Abstract | ||||||||||||||||||||
3 | 38 | Aleksei Tiulpin | University of Oulu | Image-Level Regression For Uncertainty-Aware Retinal Image Segmentation | Trung Dang, Huy Hoang Nguyen and Aleksei Tiulpin | 1. Outer context: The retina offers valuable diagnostic insights into various clinical conditions non-invasively. Quantitative assessment of retinal vasculature is crucial for diagnosing retinal diseases and identifying systemic conditions like hypertension, diabetes, and cardiovascular diseases. Numerous studies aim to automate retinal blood vessel segmentation (BVS) using Deep Learning (DL) approaches, typically through semantic segmentation. 2. Research problem / question: Training DL models for BVS typically requires annotation masks. We aim to address the uncertainty in the annotation process, particularly in regions close to retinal vessel boundaries. In addition, we question the necessity of utilizing high-resolution (HR) retinal images. 3. Key Motivation: Conventionally, the BVS problem is formulated as pixel-wise classification, which heavily relies on annotated masks. However, these masks often contain high levels of uncertainty, especially in the area surrounding retinal vessels. 4. Prior attempts at addressing it: Prior attempts at addressing this include label smoothing techniques. In addition, some studies incorporate multi-annotations. 5. Why are prior attempts not enough? Label smoothing-based approaches cannot address the intra-class uncertainty of pixels around objects of interest. Collecting multi-annotations data is costly. 6. Method: To address the uncertainty of annotations masks, we formulate retinal vessel segmentation as image-level regression (Figure 1). Firstly, we introduce the Segmentation Annotation Uncertainty-Aware (SAUNA) transform, converting binary masks into soft labels that capture the uncertainty around retinal vessels. Additionally, we adapt the Jaccard metric loss (JML) [1] to operate in any hypercube, enabling effective training for image-level regression. 7. Results: We applied our method to the UNet++ [3] and Swin-UNet [5] architectures, and then compared them to a diverse array of 15 HR and low-resolution (LR)-based baselines. On the FIVES dataset (Figure 2), our method with UNet++ was the only LR-based approach that substantially outperformed the best HR-based reference, MAGF-Net [8], with a difference of 1.15% IoU while being over 149 times more efficient. With the same DL architecture, applying our image-level regression method resulted in significant improvements compared to the pixel-wise classification approach. Our method generalized better compared to other LR-based baselines on the other 4 external datasets: DRIVE, STARE, HRF, and CHASEDB1 (Table 1). 8. Conclusion: This study introduces a regression-based method for retinal vessel segmentation. We employed the newly developed SAUNA transform to produce soft labels, addressing the uncertainty inherent in the annotation process. Through comprehensive experimental assessment, we established that our approach surpasses existing methods. Our findings suggest a reconsideration of the necessity for HR retinal images in retinal image segmentation. | ||||||||||||||||||||
4 | 61 | Mikko Kurimo | Aalto University | Unlocking the Potential of Radio and Television Archives: Combination of Strengths in Advancing Speech Recognition | Mikko Kurimo, Tamás Grósz, Yaroslav Getman, Tommi Lehtonen and Mervi Leino-Niemelä | We will present results of combining the latest research in automatic speech recognition (ASR) with novel European high performance computing (HPC) and large quantities of raw audiovisual data contained in national radio and television archives. The aim of the work was two-fold, firstly, to advance ASR by building models on large public data collections and secondly, to harness the large audio-visual media archives for large-scale qualitative and quantitative media research by generating an automatic indexation based on all spoken content that is decoded by ASR. For most languages spoken in the world, reaching these goals requires creative solutions, because the required resources do not meet. Only the largest global companies can have access both to the latest ASR development, huge computing resources and huge audio collections, but their commercial interests do not treat all languages equally. In Europe, most languages are spoken in small countries which, however, have advanced radio and television archives containing millions of hours of broadcasted media content. The latest publicly funded HPC initiatives have also opened researchers an access to unprecedented computational resources. By utilizing the computing and archives it is now possible for researchers to develop and publish large pre-trained speech models for many languages without depending on the commercial interests of the large global companies. The large speech models can be pre-trained in a self-supervised fashion which can benefit also from raw untranscribed and uncategorized audio collections. When openly published, these models make it then easy and quick to develop speech technology applications, such as accurate recognizers for ASR and speech, speaker and audio characteristics for these languages by fine-tuning the models using a feasible amount of transcribed target data. In a case study for Finnish, we developed a large monolingual pre-trained speech model and a framework for media researchers to decode the audiovisual archive content using the best ASR with large computing resources available for them in Finnish IT Center for Science (CSC). | ||||||||||||||||||||
5 | 95 | Sumita Sharma | University of Oulu | Child-centered AI: Imagining fair, inclusive, and diverse future classrooms with children | Sumita Sharma | Children interact with AI applications in a myriad of ways including social media and recommendation systems, generative AI (such as ChatGPT, Dalle2), and indirectly through various algorithmic decision-making for public and private services (e.g., social services, banking). While there are several initiatives focusing on AI literacy for children, however, children are rarely introduced to the limitations and ethical implications of the design and use of such AI systems. To address this gap, through the Research Council of Finland funded PAIZ project (https://interact.oulu.fi/paiz), I conducted hands-on workshop on critical AI literacy with young children (10-12 years) to explore the role of AI in children’s everyday lives, and how to design ethical, inclusive, and fair AI-futures, that is, envisioning child-centered AI applications and futures. In the workshops, participants generated AI art and text, discussed who owns the content and where and how they can be used. They explored image recognition (Teachable Machines) and what is means for an AI to “see” (see Figure 1), and contemplated the ethical implications of self-driving cars. Children then oriented to the future to imagine fair, inclusive, diverse applications for future classrooms (Figure 1). Workshops are conducted with children in Finland (45 participants), India (45), Japan (102), & USA (27). Preliminary data analysis shows how children understood critical concepts such as fairness in AI and tech access and use, and their visions of fair, inclusive future techno-social societies. Participants critiqued the ethical use of AI art, highlighting that “we can use AI-generated art for inspiration but not submit it to art competitions as our own work” (Oulu) and that “Dalle-2 can draw things quickly, creatively, making our work easier, but it is only fair if everyone access to it. If an art competition if is for people, it is a competition of the human mind, not machines” (New Delhi). Participants in the US exclaimed that, AI generated artwork “doesn’t feel as authentic…” and that “…Maybe artists who were making art before might feel useless” (US) (see figure 2 for artwork examples). The project contributes to work on Children and AI by UNICEF and the STN Generation AI project, by showcasing multicultural perspectives towards fairness and inclusion and young children’s ethical sense-making abilities. It adds to the Child-Computer Interaction research focusing on designing Child-Centered AI through its future imaginings of future classrooms, building on children’s everyday experiences. | ||||||||||||||||||||
6 | 111 | Erjon Skenderi | University of Helsinki | Group Conversational AI: Introducing Effervesce | Erjon Skenderi, Salla-Maria Laaksonen, Kaisa Lindholm, Mia Leppälä and Jukka Huhtamäki | Communication tools enhanced with Large Language Models (LLM) are important for facilitating effective group conversations in digital workspaces, and it's crucial to develop these models to facilitate many-to-many conversations as well. Recent conversational AI applications are designed for one-to-one interactions in the form of chat [1,2]. Our study investigates the challenges and opportunities of fine-tuning and deploying an AI-driven group conversational bot, Effervesce, within a multi-member Slack environment. Exploring the applications of large language models for group conversation is crucial due to the increased use of digital communication tools in organizations. Dynamic group conversational bots can be more helpful than conventional one-to-one, and help increase human interaction [3]. Recent conversational AI applications designed for one-to-one interactions in the form of chat are not trained to infer group conversations by default. In our initial experimentation with the Effervesce Slack bot, we employed various open-source LLMs, which showed limited capabilities in handling complex, multi-actor conversations. To tackle this issue, we employed the open-source Mistral 7B model [4], fine-tuned using the QLoRA framework [5], and a dataset of 1.6k Slack messages extracted from a group conversation. According to our preliminary results, the fine-tuned model results in an improved understanding of conversation structure and engagement in group discussions. We evaluate the performance of the fine-tuned Mistral model quantitatively on another similar group-discussion dataset, showing that the fine-tuned version performs better than the original. Additionally, the evaluation of Effervesce through five workshops involving 50 individual participants showed positive impacts on organizational communication dynamics. We also received feedback for further improvements. The fine-tuned LLM-powered Effervesce bot shows positive results in facilitating multi-actor conversations within Slack, showing the potential to enhance group communication dynamics in organizations. Additionally, our work analyses the advantages and limitations of current LLMs when applied in a group communication setting. Future work will focus on addressing suggestions from user feedback to further improve the bot’s conversational abilities and extend its functionalities. | ||||||||||||||||||||
7 | Monday 21.10. at 13.00–14.00. Room: Tekla Hultin (F3003). Chair: Tapio Pahikkala | |||||||||||||||||||||||||
8 | # | Presenter | Affiliation (presenter) | Title | Authors | Abstract | ||||||||||||||||||||
9 | 8 | Aidan Scannell | Aalto University | Sample-Efficient Reinforcement Learning with Implicitly Quantized Representations | Aidan Scannell, Kalle Kujanpaa, Yi Zhao, Arno Solin and Joni Pajarinen | Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future states, respectively. We achieve high performance and prevent representation collapse by normalizing and bounding the latent representation such that rank in the representation is empirically preserved. Our method is straightforward, compatible with any model-free RL algorithm, and demonstrates state-of-the-art performance in continuous control locomotion benchmarks from DeepMind Control Suite and manipulation benchmarks from MetaWorld. | ||||||||||||||||||||
10 | 27 | Abdullah Tokmak | Aalto University | PACSBO: Probably approximately correct safe Bayesian optimization | Abdullah Tokmak, Thomas B. Schön and Dominik Baumann | In recent years, reinforcement learning (RL) has achieved remarkable success in controlling high-dimensional systems without requiring a dynamics model. However, most of the impressive results have been obtained in simulation environments. Applying RL algorithms to real-world robotic systems is challenging: (i) When interacting with real-world environments, it is crucial to guarantee safety, which popular RL algorithms fail to provide; (ii) Each sample corresponds to a potentially expensive experiment, and hence sample efficiency is essential, whereas RL is inherently sample-inefficient. Combining Gaussian process (GP) regression with Bayesian optimization (BO) provides a sample-efficient alternative to RL. Based on GP regression and BO, several algorithms have been proposed that can, in addition, provide probabilistic safety guarantees [1]. These safe learning algorithms aim at optimizing an unknown reward function while satisfying constraints. In exchange for the safety guarantees, they require smoothness assumptions. Particularly, they assume that reward and constraint functions have a known upper bound in a reproducing kernel Hilbert space (RKHS). The RKHS norm is a norm in a potentially infinite dimensional space, and assuming knowledge of that norm in unknown environments is highly unrealistic. Notably, a too loose upper bound on the RKHS norm leads to conservative algorithms, whereas an underestimation might lead to constraint violations. An alternative to assuming a known RKHS norm upper bound is estimating it from data. References [2,3] discuss ways to under-estimate the unknown RKHS norm, which is overly optimistic and hence can cause constraint violations. We draw inspiration from these ideas and propose an approach to estimating an upper bound on the RKHS norm from data, eliminating the need for guessing the correct RKHS norm. Moreover, we investigate the theoretical properties of the RKHS norm over-estimation by proving that the estimate is probably approximately correct. Furthermore, we treat the RKHS norm as a local object and thus improve exploration, which in total yields PACSBO. PACSBO successfully estimates the RKHS norm, outperforms [1] in a toy example, and control a hardware system. Unlike prior works, we drop the assumption of knowing a tight upper bound on the RKHS norm of reward and constraint functions. Instead, we estimate the upper bound from data and theoretically investigate the estimation. Besides, we treat the RKHS norm as a local object in the area in which we are interested. This local treatment reduces conservatism compared to assuming one global upper bound on the entire parameter space. We successfully evaluate PACSBO in numerical and hardware experiments. [1] Y. Sui, A. Gotovos, J. Burdick, and A. Krause. “Safe Exploration for Optimization with Gaussian Processes”. In: International Conference on Machine Learning. 2015, pp. 997–1005. [2] P. Scharnhorst, E. T. Maddalena, Y. Jiang, and C. N. Jones. “Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Op- timization Approach”. In: IEEE Transactions on Automatic Control 68.5 (2023), pp. 2848–2861. [3] K. Hashimoto, A. Saoud, M. Kishida, T. Ushio, and D. V. Dimarogonas. “Learning-based symbolic abstractions for nonlinear control systems”. In: Automatica 146 (2022). extended version on arxiv:1612.05327v3, p. 110646. | ||||||||||||||||||||
11 | 70 | Sahar Salimpour | University of Turku | Sim-to-Real Transfer for Autonomous Lidar-based Navigation with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots | Sahar Salimpour, Jorge Peña Queralta, Jukka Heikkonen and Tomi Westerlund | Unprecedented agility and dexterous manipulation have been demonstrated with controllers based on deep reinforcement learning (RL), with a significant impact on legged and humanoid robots. Modern tooling and simulation platforms, such as NVIDIA Isaac Sim, have been powering such advances. This paper focuses on demonstrating the applications of Isaac in local planning and obstacle avoidance as one of the most fundamental ways in which a mobile robot interacts with its environment. The literature includes extensive reproducible research in areas where the RL policy state space is largely sensed through proprioception. We argue in this paper that approaches to interaction with the environment in end-to-end learning with exteroception are less standardized and reproducible. At the same time, the article aims to provide a base tutorial for end-to-end local navigation policies and how a custom robot can be trained in such a simulation environment. We benchmark end-to-end policies with the state-of-the-art Nav2 package in ROS2. We also cover the sim-to-real transfer process by demonstrating the zero-shot transferability of policies trained in the Isaac simulator to real-world robots. This is further evidenced by the tests with different simulated robots, showcasing the generalization of the learned policy. Finally, the benchmarks demonstrate comparable performance to Nav2, opening the door to quick deployment of state-of-the-art end-to-end local planners for custom robot platforms, but importantly furthering the possibilities by expanding the state and action spaces, or task definitions for more complex missions. Overall, with this paper, we introduce the most important steps, and aspects to consider, in deploying RL policies for local path planning and obstacle avoidance with Isaac Sim training, Gazebo testing, and ROS2 for real-time inference in real robots. | ||||||||||||||||||||
12 | 77 | Denys Iablonskyi | University of Helsinki | Towards Industry 4.0: Physics-informed AI for Ultrasound Structural Health Monitoring | Denys Iablonskyi, Burla Korkmaz, Moontasir Soumik, Shayan Gharib, Julius Korsimaa, Petteri Salminen, Martin Weber, Edward Hæggström, Ari Salmi and Arto Klami | Industrial structures such as pipelines can become fouled for several reasons when unwanted substances accumulate on the inner surfaces. Detection and characterization of the fouled area are thus of critical importance for sustainable industrial operations, localized cleaning, and predictive maintenance. Ultrasonic waves are sensitive to such deposits or defects and are typically generated and recorded using a network of ultrasonic transducers. Typical tomographic methods rely on solving full-wave equations or iterative methods that are time-consuming. In this work, we present a method for accurate reconstruction of the fouling maps using pre-trained neural networks that opens a way to real-time structural health monitoring. The synthetic training dataset is obtained using a physics-informed forward model that incorporates the fouling effect on the propagating ultrasonic signals. Thus the AI models can be trained offline and applied to experimental signals on the fly to generate defect maps. The variational autoencoder variant of the network also provides the error estimate of the reconstruction quality. The proposed method is verified experimentally on the pipe structure using a sensor-efficient configuration with only four transducers benefiting from the high-order helical trajectories that guided waves can propagate along. Moreover, the method can be easily extended to more complex structures e.g., storage tanks or pipeline connections, as it only requires the guided wave trajectories and corresponding signal attenuation information. Modern non-invasive cleaning methods employ high-power ultrasound and operate in a brute-force manner according to a schedule. Identifying fouled areas and guiding the cleaning process locally will evidently lead to energy efficiency, production increase, and sustainability, enabling Industry 4.0. | ||||||||||||||||||||
13 | ||||||||||||||||||||||||||
14 | ||||||||||||||||||||||||||
15 | ||||||||||||||||||||||||||
16 | ||||||||||||||||||||||||||
17 | ||||||||||||||||||||||||||
18 | ||||||||||||||||||||||||||
19 | ||||||||||||||||||||||||||
20 | ||||||||||||||||||||||||||
21 | ||||||||||||||||||||||||||
22 | ||||||||||||||||||||||||||
23 | ||||||||||||||||||||||||||
24 | ||||||||||||||||||||||||||
25 | ||||||||||||||||||||||||||
26 | ||||||||||||||||||||||||||
27 | ||||||||||||||||||||||||||
28 | ||||||||||||||||||||||||||
29 | ||||||||||||||||||||||||||
30 | ||||||||||||||||||||||||||
31 | ||||||||||||||||||||||||||
32 | ||||||||||||||||||||||||||
33 | ||||||||||||||||||||||||||
34 | ||||||||||||||||||||||||||
35 | ||||||||||||||||||||||||||
36 | ||||||||||||||||||||||||||
37 | ||||||||||||||||||||||||||
38 | ||||||||||||||||||||||||||
39 | ||||||||||||||||||||||||||
40 | ||||||||||||||||||||||||||
41 | ||||||||||||||||||||||||||
42 | ||||||||||||||||||||||||||
43 | ||||||||||||||||||||||||||
44 | ||||||||||||||||||||||||||
45 | ||||||||||||||||||||||||||
46 | ||||||||||||||||||||||||||
47 | ||||||||||||||||||||||||||
48 | ||||||||||||||||||||||||||
49 | ||||||||||||||||||||||||||
50 | ||||||||||||||||||||||||||
51 | ||||||||||||||||||||||||||
52 | ||||||||||||||||||||||||||
53 | ||||||||||||||||||||||||||
54 | ||||||||||||||||||||||||||
55 | ||||||||||||||||||||||||||
56 | ||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||
58 | ||||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||
60 | ||||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||
62 | ||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 |