All Lectures February 2026.xlsx

	A	B	C	D	E	F	G	H	I	J
1	ID	Title	Abstract	Event name	Speaker names	Author names	Subjects	Tag	Package	Awards
2	146033	Presidential Panel on the Future of AI Research	Over the past few years, Artificial Intelligence has bounded into the mainstream of society. Remarkable technical achievements in the use of Deep Learning and Large Language Models have given rise to expectations and hype regarding the possibility of achieving artificial general intelligence, as well as general concerns over the potential deleterious consequences of emerging AI technologies and how to ensure their responsible use. In this panel we engage four Past AAAI Presidents to discuss their views on questions relating to the current state and future of AI research, including such topics as important emerging application areas, current technical challenges, the eventual prospects for achieving artificial general intelligence, and potential AI risks and solutions.	AAAI 2026 Main Conference	Stephen Smith, Raj Reddy, Eric Horvitz, Manuela Veloso, Bart Selman			panel	free
3	146031	The Essence of Intelligence is Appropriate Action (not thinking, reasoning, learning or language) and other things every student of AI should know	An agent acts in its world to achieve its objectives. Intelligence allows the agent to make decisions and act. In natural domains, sensing is limited, so acting is gambling. It’s a myth that passive learning and more data are all we need. An agent cannot learn from observations alone. It needs a real body to carry out experiments in its world, testing hypotheses, to determine causation, refining its model of the world’s dynamics. The agent is acting as a scientist: refining its model through experiments and acting appropriately to achieve its objectives. Its objectives depend on its preferences and values, and those of other agents its actions impact. Determining which values to use, and how preferences can be acquired fairly, is a major non-technical challenge. We address three primary questions: What should an agent believe? What should an agent do, given its beliefs, preferences, and abilities? What should the preferences of an agent be? Integrating these issues motivates the design of our latest AI textbook, Artificial Intelligence: Foundations of Computational Agents, (3rd Ed. 2023). Alan Mackworth is a Professor Emeritus of Computer Science at the University of British Columbia. He works on artificial intelligence with applications in constraint satisfaction, cognitive robotics, assistive technology, hybrid systems and constraint-based agents. He invented the world’s first soccer-playing robots. He has co-authored two books: Computational Intelligence: A Logical Approach (1998) and Artificial Intelligence: Foundations of Computational Agents (2023, 3rd Ed.). Alan co-founded the UBC Cognitive Systems Program, the Centre for AI, Decision-making and Action (CAIDA) and the AI network of BC (AInBC). He has served as President of AAAI, IJCAI and CAIAC. He is a Fellow of AAAI, CAIAC, AGE-WELL, CIFAR and the Royal Society of Canada. David Poole is a Professor Emeritus of Computer Science at the University of British Columbia. He is known for his work on combining logic and probability, probabilistic inference, relational probabilistic models, statistical relational AI and semantic science. He is a co-author of two AI textbooks (Cambridge University Press, 3rd edition 2023, and Oxford University Press, 1998), and co-author of ” Statistical Relational Artificial Intelligence: Logic, Probability, and Computation”. He is a former chair of the Association for Uncertainty in Artificial Intelligence, the winner of the Canadian AI Association (CAIAC) 2013 Lifetime Achievement Award, and is a Fellow of the Association for the Advancement Artificial Intelligence (AAAI) and CAIAC.	AAAI 2026 Main Conference	David Poole		AAAI 2026 Invited Talks	invited talk	free	Patrick Henry Winston Outstanding Educator Award
4	146029	Small Data: A New Paradigm for the Next Generation of AI		AAAI 2026 Main Conference	Derek Haoyang Li		AAAI 2026 Invited Talks	invited talk	free
5	146028	AI for Reskilling, Upskilling, and Workforce Development	As AI becomes increasingly powerful and ubiquitous, it is disrupting skills and displacing workers. NSF’s National AI Institute for Adult Learning and Online Education (AI-ALOE) posits that AI can be part of the solution to the growing problem if we can use AI for reskilling, upskilling, and workforce development at scale. The long-term vision of AI-ALOE is to develop and use AI technologies to enhance the proficiency of online education for all adult learners, using in-person education as a benchmark. The day-to-day mission of AI-ALOE is to conduct responsible research into AI that is grounded in theories of human cognition and learning and derived from the scientific process of learning engineering. I will describe ongoing research at AI-ALOE.	AAAI 2026 Main Conference	Ashok Goel		AAAI 2026 Invited Talks	invited talk	free	Robert S. Engelmore Memorial Lecture Award
6	146026	Professor Edward Feigenbaum: a Tribute to and Lecture by a Pioneer of AI on his 90th Birthday		AAAI 2026 Main Conference	Raj Reddy, Eric Horvitz, Bart Selman, Edward Feigenbaum, Yolanda Gil, Peter Friedland		AAAI 2026 Invited Talks	invited talk	free
7	146024	Navigating the AI Horizon: Promises, Perils, and the Power of Collaboration	We stand at the dawn of the AI era, a technological revolution poised to be the most consequential of our generation, presenting both unprecedented opportunities and profound challenges. But this promise is shadowed by significant challenges. To build a future we want, we must move beyond the hype and the headlines to confront the most pressing open problems—technical, sociotechnical, and multidisciplinary. This talk will review the rapid progress, dissect challenges ahead, and argue that our greatest task isn’t simply building smarter machines, but fostering the human wisdom to guide them towards a future that is not only intelligent but also equitable, safe, and profoundly human.	AAAI 2026 Main Conference	Ece Kamar		AAAI 2026 Invited Talks	invited talk	free
8	146023	AI and Program Reviewing Panel		AAAI 2026 Main Conference	Odest Chadwicke Jenkins, Kevin Leyton-Brown, Joydeep Biswas, Matthew E. Taylor			panel	free
9	146019	Towards Embodied Agents that See, Simulate, and Reason	Large language models have revolutionized textual reasoning, yet their ability to act meaningfully in multimodal, real-world environments remains limited. They struggle to ground their decisions in visual context, adapt to changing goals, and plan actions over time—shortcomings that stem from a lack of structured, goal-driven reasoning and insufficient representations of the physical world. In this talk, I present a unified framework for building embodied agents that can see, simulate, and reason. I begin by introducing methods for learning world simulators from data, arguing that visual reasoning—like textual reasoning—benefits from step-by-step processing. Inverting a physics simulator becomes key: agents must infer structured 3D neural representations of objects, parts, motions, and scenes directly from raw video. I describe methods for extracting such representations using generative priors, injecting them into vision-language models (VLMs), and scaling up their supervision via generative 3D simulation and fast, modular physics engines. These simulators enable agents to anchor their predictions in grounded physical reality, reducing hallucinations and improving control. Complementing this simulation capability, I explore techniques that enable agents to reason over time and adapt their behavior. By integrating structured memory systems, agents learn to retain and retrieve relevant experiences to inform long-horizon plans. Language-based reflective feedback allows them to refine their strategies beyond what sparse rewards offer, forming abstractions that generalize across tasks. When trained to ground their reasoning directly in the visual environment, agents gain the ability to set subgoals, explore effectively, and verify their own hypotheses. Together, these advances point toward autonomous systems that simulate before they act, reflect after they fail, and maintain an ongoing awareness of goals, constraints, and context. I will illustrate these capabilities across web automation, robotics, and interactive assistance, showing how agents that see, simulate, and reason offer a promising path toward general-purpose embodied intelligence.	AAAI 2026 Main Conference	Katerina Fragkiadaki		AAAI 2026 Invited Talks	invited talk	free
10	146018	From Workflows to Water Coolers: AI That Can Navigate Human Nature		AAAI 2026 Main Conference	Yolanda Gil		AAAI 2026 Invited Talks	invited talk	free
11	146016	Fundamental physics and science communication	Physicists aim to explain the Universe in terms of a compact, interpretable set of principles. Deducing those principles from experiments poses many challenging and problems which are ripe for application of AI and present opportunities to develop new AI techniques. I will describe how AI has changed the way particle physicists work and speculate about the role of AI in the future of fundamental physics. Finally, I will describe my experience in science communication, as an author, podcaster and television producer.	AAAI 2026 Main Conference	Daniel Whiteson		AAAI 2026 Invited Talks	invited talk	free
12	146014	Quest of AI towards Specializable Generalist: From Reasoning to Scientific Discovery	The pursuit of high-efficiency Artificial General Intelligence (AGI) requires more than brute-force scaling of model size and data. While scaling remains a key driver of capability, equally important are scalable architectural and principles—designs that continue to work, improve, and remain controllable as we vary model scale, domains, and modalities. Central to our approach is the concept of the “Specialized Generalist” – a pathway that achieves deep expertise across multiple domains without sacrificing broad generalization capabilities. In this talk, we introduce the “Specialized Generalist” paradigm and our implementation of it, SAGE (Synergistic Architecture for Generalized Expertise), a three-layer architecture designed to balance specialization and generalization in a systematic way. We will describe how SAGE’s Base Model, Synergy Fusion, and Exploration-Evolution layers interact in practice, focusing on concrete mechanisms for coordinating domain-specific expertise with broad general reasoning. We will share empirical results and recent advances in large reasoning models, embodied AI, and scientific applications to further illustrate the approach. A central motivation is to support “AGI for Science” by building a stable plateau of capabilities that can reliably assist with complex scientific workflows rather than isolated demos. Finally, we will outline the safety and governance questions that arise when deploying Specialized Generalist systems in high-impact settings, and discuss what we have learned so far about monitoring, alignment, and operational safeguards.	AAAI 2026 Main Conference	Bowen Zhou		AAAI 2026 Invited Talks	invited talk	free
13	146013	From How to learn to What to learn in Multiagent Systems and Robotics	There has been a lot of exciting recent progress on new and powerful machine learning algorithms and architectures: how to learn. But for autonomous agents acting in the dynamic, uncertain world, it is at least as important to be able to identify which concepts and subproblems to focus on: what to learn. This talk presents methods for identifying what to learn within the framework of reinforcement learning, focusing especially on applications in multiagent systems and robotics.	AAAI 2026 Main Conference	Peter Stone		AAAI 2026 Invited Talks	invited talk	free
14	146002	Exact Combinatorial Multi-Class Graph Cuts for Semi-Supervised Learning	Semi-supervised learning (SSL) on graphs is critical in applications where labeled data are scarce and costly, yet existing graph-based methods often degrade under extreme label sparsity or class imbalance, yielding trivial or unstable solutions. We introduce \textbf{CombCut}, the first exact combinatorial optimization framework for multi-class graph-based semi-supervised learning that operates directly on binary one-hot assignments, without any convex relaxation or heuristic volume constraints. By employing a minorization–maximization (MM) scheme, CombCut transforms each step into a structured linear assignment problem solved efficiently via network-flow algorithms. Total unimodularity guarantees integral iterates, and our theoretical analysis establishes both monotonic ascent of the true discrete objective and convergence of every limit point to a Karush–Kuhn–Tucker (KKT) stationary solution of the original combinatorial problem. Our approach requires no hyperparameter tuning and scales near-linearly in the number of vertices. Empirical evaluation on MNIST, Fashion-MNIST, and CIFAR-10 with as few as 1–5 labels per class shows that CombCut excels in worst-case labeling scenarios, significantly outperforming state-of-the-art graph-SSL baselines and yielding more stable and accurate label propagation under severe supervision constraints.	AAAI 2026 Main Conference			Artificial Intelligence	poster	free
15	145957	Engaging with Bias in Computer Vision: A Group Assignment for Remote Learning		AAAI 2026 Main Conference			Artificial Intelligence	technical paper	free
16	145955	Discover Combinatorial Structures using Deep Cross-Entropy Method		AAAI 2026 Main Conference			Artificial Intelligence	technical paper	free
17	145953	Dimensionality Reduction Adventures with Animal Faces		AAAI 2026 Main Conference			Artificial Intelligence	technical paper	free
18	145584	Solving Connections: Thinking Like Wyna		AAAI 2026 Main Conference			Artificial Intelligence	technical paper	free
19	145582	ArguBot Arena: Prompt Engineering a Debate on Responsible AI		AAAI 2026 Main Conference			Artificial Intelligence	technical paper	free
20	143560	Towards Generalist Robot Learning from Internet Video: A Survey	Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from Videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.	AAAI 2026 Main Conference		Robert McCarthy	Artificial Intelligence	technical paper	free
21	143559	Towards an Ontology-Driven Approach to Document Bias	Machine learning (ML)-powered systems are capable of reproducing and often amplifying undesired biases embedded in society, emphasizing the importance of operating under practices that enable the study and understanding of the intrinsic characteristics of ML pipelines. This supports the emergence of documentation frameworks with the idea that “any remedy for bias starts with awareness of its existence.” However, a resource that can formally describe ML pipelines in terms of detected biases is still missing. To address this gap, we present the Doc-BiasO ontology, a resource that sets out to create an integrated vocabulary of biases defined in the Trustworthy AI literature and their measures, as well as to incorporate relevant domain terminology and relationships between them. Overseeing ontology engineering best practices, we reuse existing vocabularies on machine learning and AI to foster knowledge sharing and interoperability between the actors concerned with its research, development, regulation, and others. In addition, we demonstrate the potential of Doc-BiasO with an experiment on an existing benchmark and as part of a neuro-symbolic system. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact through its documentation.	AAAI 2026 Main Conference		Mayra Russo	Artificial Intelligence	technical paper	free
22	143558	The Search for Stability: Learning Dynamics of Strategic Publishers with Initial Documents	We study a game-theoretic information retrieval model in which strategic publishers aim to maximize their chances of being ranked first by the search engine while maintaining the integrity of their original documents. We show that the commonly used Probability Ranking Principle (PRP) ranking scheme results in an unstable environment where games often fail to reach pure Nash equilibrium. We propose two families of ranking functions that do not adhere to the PRP. We provide both theoretical and empirical evidence that these methods lead to a stable search ecosystem, by providing positive results on the learning dynamics convergence. We also define the publishers’ and users’ welfare, demonstrate a possible publisher-user trade-off, and provide means for a search system designer to control it. Finally, we show how instability harms long-term users’ welfare.	AAAI 2026 Main Conference		Moshe Tennenholtz, Omer Madmon, Itamar Reinman, Idan Pipano	Artificial Intelligence	technical paper	free
23	143557	Symbolic Task Inference in Deep Reinforcement Learning	This paper proposes DeepSynth, a method for effective training of deep reinforcement learning agents when the reward is sparse or non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact finite state automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton, so that the generation of a control policy by deep reinforcement learning is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse or non-Markovian rewards. We have evaluated DeepSynth’s performance in a set of experiments that includes the Atari game Montezuma’s Revenge, known to be challenging. Compared to approaches that rely solely on deep reinforcement learning, we obtain a reduction of two orders of magnitude in the iterations required for policy synthesis, and a significant improvement in scalability.	AAAI 2026 Main Conference		Hosein Hasanbeig	Artificial Intelligence	technical paper	free
24	143555	Scalable Synthesis of Formally Verified Neural Value Function for Hamilton-Jacobi Reachability Analysis	Hamilton-Jacobi (HJ) reachability analysis provides a formal method for guaranteeing safety in constrained control problems. It synthesizes a value function to represent a long-term safe set called feasible region. Early synthesis methods based on state space discretization cannot scale to high-dimensional problems, while recent methods that use neural networks to approximate value functions result in unverifiable feasible regions. To achieve both scalability and verifiability, we propose a framework for synthesizing verified neural value functions for HJ reachability analysis. Our framework consists of three stages: pre-training, adversarial training, and verification-guided training. We design three techniques to address three challenges to improve scalability respectively: boundary-guided backtracking (BGB) to improve counterexample search efficiency, entering state regularization (ESR) to enlarge feasible region, and activation pattern alignment (APA) to accelerate neural network verification. We also provide a neural safety certificate synthesis and verification benchmark called Cersyve-9, which includes nine commonly used safe control tasks and supplements existing neural network verification benchmarks. Our framework successfully synthesizes verified neural value functions on all tasks, and our proposed three techniques exhibit superior scalability and efficiency compared with existing methods.	AAAI 2026 Main Conference		Yujie Yang	Artificial Intelligence	technical paper	free
25	143554	On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios	Explanation generation frameworks aim to make AI systems’ decisions transparent and understandable to human users. However, generating explanations in uncertain environments characterized by incomplete information and probabilistic models remains a significant challenge. In this paper, we propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations. Monolithic explanations provide self-contained reasons for an explanandum without considering the agent receiving the explanation, while model reconciling explanations account for the knowledge of the agent receiving the explanation. For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum. For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models, where the goal is to find explanations that increase the probability of the explanandum while minimizing conflicts between the explanation and the probabilistic human model. We introduce explanatory gain and explanatory power as quantitative metrics to assess the quality of these explanations. Further, we present algorithms that exploit the duality between minimal correction sets and minimal unsatisfiable sets to efficiently compute both types of explanations in probabilistic contexts. Extensive experimental evaluations on various benchmarks demonstrate the effectiveness and scalability of our approach in generating explanations under uncertainty.	AAAI 2026 Main Conference		Stylianos Vasileiou	Artificial Intelligence	technical paper	free
26	143553	Interpreting capsule networks for image classification by routing path visualization	Artificial neural networks are popular for computer vision as they often give state-of-the-art performance, but are difficult to interpret because of their complexity. This black box modeling is especially troubling when the application concerns human well-being such as in medical image analysis or autonomous driving. In this work, we propose a technique called routing path visualization for capsule networks, which reveals how much of each region in an image is routed to each capsule. In turn, this technique can be used to interpret the entity that a given capsule detects, and speculate how the network makes a prediction. We demonstrate our new visualization technique on several real world datasets. Experimental results suggest that routing path visualization can precisely localize the predicted class from an image, even though the capsule networks are trained using just images and their respective class labels, without additional information defining the location of the class in the image.	AAAI 2026 Main Conference		Amanjot Bhullar	Artificial Intelligence	technical paper	free
27	143551	Choosing abstraction levels for model-based software debugging: A theoretical and empirical analysis for spreadsheet programs	Model-based diagnosis is a generally applicable, principled approach to the systematic debugging of a wide range of system types such as circuits, knowledge bases, physical devices, or software. Based on a formal description of the system, it enables precise and deterministic reasoning about potential faults responsible for observed misbehavior. In software, such a formal system description can often even be extracted from the buggy program fully automatically. As logical reasoning is central to diagnosis, the performance of model-based debuggers is largely influenced by reasoning efficiency, which in turn depends on the complexity and expressivity of the system description. Since highly detailed models capturing exact semantics often exceed the capabilities of current reasoning tools, researchers have proposed more abstract representations. In this work, we thoroughly analyze system modeling techniques with a focus on fault localization in spreadsheets—one of the most widely used end-user programming paradigms. Specifically, we present three constraint model types characterizing spreadsheets at different abstraction levels, show how to extract them automatically from faulty spreadsheets, and provide theoretical and empirical investigations of the impact of abstraction on both diagnostic output and computational performance. Our main conclusions are that (i) for the model types, there is a trade-off between the conciseness of generated fault candidates and computation time, (ii) the exact model is often impractical, and (iii) a new model based on qualitative reasoning yields the same solutions as the exact one in up to more than half the cases while being orders of magnitude faster. Due to their ability to restrict the solution space in a sound way, the explored model-based techniques, rather than being used as standalone approaches, are expected to realize their full potential in combination with iterative sequential diagnosis or indeterministic but more performant statistical debugging methods.	AAAI 2026 Main Conference		Patrick Rodler	Artificial Intelligence	technical paper	free
28	143549	A simple proof-theoretic characterization of stable models: Reduction to difference logic and experiments	Stable models of logic programs have been studied and characterized in relation with other formalisms by many researchers. As already argued in previous papers, such characterizations are interesting for diverse reasons, including theoretical investigations and the possibility of leading to new algorithms for computing stable models of logic programs. At the theoretical level, complexity and expressiveness comparisons have brought about fundamental insights. Beyond that, practical implementations of the developed reductions enable the use of existing solvers for other logical formalisms to compute stable models. In this paper, we first provide a simple characterization of stable models that can be viewed as a proof-theoretic counterpart of the standard model-theoretic definition. We further show how it can be naturally encoded in difference logic. Such an encoding, compared to the existing reductions to classical logics, does not require Boolean variables. Then, we implement our novel translation to a Satisfiability Modulo Theories (SMT) formula. We finally compare our approach, employing the SMT solver yices, to the translation-based ASP solver lp2diff and to clingo on domains from the “Basic Decision” track of the 2017 Answer Set Programming competition. The results show that our approach is competitive to and often better than lp2diff, and that it can also be faster than clingo on non-tight domains.	AAAI 2026 Main Conference		Marco Maratea	Artificial Intelligence	technical paper	free
29	143547	A Fortiori Case-Based Reasoning: From Theory to Data	The widespread application of uninterpretable machine learning systems for sensitive purposes has spurred research into elucidating the decision-making process of these systems. These efforts have their background in many different disciplines, one of which is the field of AI & law. In particular, recent works have observed that machine learning training data can be interpreted as legal cases. Under this interpretation, the formalism developed to study case law, called the theory of precedential constraint, can be used to analyze the way in which machine learning systems draw on training data—or should draw on them—to make decisions. In the present work, we advance the theory underlying these explanation methods, by relating it to order theory and logic. This allows us to write a software implementation of the theory that can be used to compute with the definitions and give automatic proofs of the properties of the model. We use this implementation to evaluate the model on a series of datasets. Through this analysis, we characterize the types of datasets that are more, or less, suitable to be described by the theory.	AAAI 2026 Main Conference		Wijnand van Woerkom	Artificial Intelligence	technical paper	free
30	143546	The Cost and Complexity of Minimizing Envy in House Allocations	We study almost envy-freeness in house allocation, where $m$ houses are to be allocated among $n$ agents so that every agent receives exactly one house. An envy-free allocation need not exist, and therefore we may have to settle for relaxations. We study different aggregate measures of envy as markers of fairness. In particular, we define the amount of envy experienced by an agent $a$ w.r.t. an allocation to be the number of agents that agent $a$ envies under that allocation. ~\cite{KMS2021} studied the problem of minimizing the number of envious agents and showed that it is NP-Complete to find allocations that minimize the number of envious agents, even for binary utilities, and this quantity is hard to approximate for general utilities. In this paper, we explore envy minimization in house allocation from a broader perspective and prove algorithmic results not only for minimizing the number of envious agents but for two other measures of envy as well---minimizing the amount of maximum envy experienced by any agent and minimizing the amount of total envy experienced by all the agents put together. We prove a host of algorithmic and hardness results. We also suggest practical approaches for these problems via integer linear program (ILP) formulations and report the findings of our experimental evaluation of ILPs. Finally, we study the price of fairness, which quantifies the loss of welfare we must suffer due to the fairness requirements, and present tight bounds as well as algorithms that simultaneously optimize both welfare and fairness.	AAAI 2026 Main Conference		Jayakrishnan Madathil	Artificial Intelligence	technical paper	free
31	143545	TactGen: Tactile Sensory Data Generation via Zero-Shot Sim-to-Real Transfer	Recent advances in machine learning have driven a step-change in robot perception with modalities such as vision, where large amounts of training data are readily available or cheap to collect. However, in tactile perception, the relatively high cost of data collection still largely impedes the adoption of such data-driven learning solutions. In this article, we introduce TactGen, a novel, cross-modal framework to tackle this challenge. In particular, using a two-step data generation pipeline, we leverage easily accessible vision data to synthesise artificial tactile data for downstream classifier training. Specifically, we use readily collected video data of objects of interest to efficiently learn neural radiance field (NeRF) representations. The NeRF models are then used to render red–green–blue-depth (RGBD) images from any desired vantage points. In the second stage, the RGBD images are translated into corresponding tactile images typically generated by camera-based tactile sensors using a conditional generative adversarial network (cGAN). The cGAN model is itself trained with a large set of visuo-tactile images collected in simulation, and can be transferred into the real world without fine-tuning or additional data collection. We extensively validate this approach in the context of tactile object classification, showing that it effectively reduces data collection time by a factor of 20 while achieving similar performance to training a classifier on manually collected real data.	AAAI 2026 Main Conference		Shaohong Zhong	Artificial Intelligence	technical paper	free
32	143544	MAGIC-VFM - Meta-Learning Adaptation for Ground Interaction Control With Visual Foundation Models	Control of off-road vehicles is challenging due to the complex dynamic interactions with the terrain. Accurate modeling of these interactions is important to optimize driving performance, but the relevant physical phenomena, such as slip, are too complex to model from first principles. Therefore, we present an offline meta-learning algorithm to construct a rapidly-tunable model of residual dynamics and disturbances. Our model processes terrain images into features using a visual foundation model (VFM), then maps these features and the vehicle state to an estimate of the current actuation matrix using a deep neural network (DNN). We then combine this model with composite adaptive control to modify the last layer of the DNN in real time, accounting for the remaining terrain interactions not captured during offline training. We provide mathematical guarantees of stability and robustness for our controller, and demonstrate the effectiveness of our method through simulations and hardware experiments with a tracked vehicle and a car-like robot. We evaluate our method outdoors on different slopes with varying slippage and actuator degradation disturbances, and compare against an adaptive controller that does not use the VFM terrain features. We show significant improvement over the baseline in both hardware experimentation and simulation.	AAAI 2026 Main Conference		Elena-Sorina Lupu	Artificial Intelligence	technical paper	free
33	143539	Towards automated self-supervised learning for truly unsupervised graph anomaly detection	Self-supervised learning (SSL) is an emerging paradigm that exploits supervisory signals generated from the data itself, and many recent studies have leveraged SSL to conduct graph anomaly detection. However, we empirically found that three important factors can substantially impact detection performance across datasets: (1) the specific SSL strategy employed; (2) the tuning of the strategy’s hyperparameters; and (3) the allocation of combination weights when using multiple strategies. Most SSL-based graph anomaly detection methods circumvent these issues by arbitrarily or selectively (i.e., guided by label information) choosing SSL strategies, hyperparameter settings, and combination weights. While an arbitrary choice may lead to subpar performance, using label information in an unsupervised setting is label information leakage and leads to severe overestimation of a method’s performance. Leakage has been criticized as “one of the top ten data mining mistakes", yet many recent studies on SSL-based graph anomaly detection have been using label information to select hyperparameters. To mitigate this issue, we propose to use an internal evaluation strategy (with theoretical analysis) to select hyperparameters in SSL for unsupervised anomaly detection. We perform extensive experiments using 10 recent SSL-based graph anomaly detection algorithms on various benchmark datasets, demonstrating both the prior issues with hyperparameter selection and the effectiveness of our proposed strategy.	AAAI 2026 Main Conference		Zhong Li	Artificial Intelligence	technical paper	free
34	143538	IEEE Transactions on Robotics (T-RO) Publication: "SICNav: Safe and Interactive Crowd Navigation Using Model Predictive Control and Bilevel Optimization"	Abstract of T-RO publication: "Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot's local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in two simulation environments and indoor experiments with a real robot to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset."	AAAI 2026 Main Conference		Sepehr Samavi	Artificial Intelligence	technical paper	free
35	143535	HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction	We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, Waymo Open, ETH3D SLAM and ScanNet++ datasets, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality. The project page and source code are available at https://hi-slam2.github.io/.	AAAI 2026 Main Conference		Wei Zhang	Artificial Intelligence	technical paper	free
36	143534	PRIMP: PRobabilistically-Informed Motion Primitives for Efficient Affordance Learning From Demonstration	This article proposes a Learning-from-Demonstration (LfD) method using probability densities on the workspaces of robot manipulators. The method, named PRobabilistically-Informed Motion Primitives (PRIMP), learns the probability distribution of the end effector trajectories in the 6-D workspace that includes both positions and orientations. It is able to adapt to new situations such as novel via points with uncertainty and a change of viewing frame. The method itself is robot-agnostic, in that the learned distribution can be transferred to another robot with the adaptation to its workspace density. Workspace-STOMP, a new version of the existing STOMP motion planner, is also introduced, which can be used as a postprocess to improve the performance of PRIMP and any other reachability-based LfD method. The combination of PRIMP and Workspace-STOMP can further help the robot avoid novel obstacles that are not present during the demonstration process. The proposed methods are evaluated with several sets of benchmark experiments. PRIMP runs more than five times faster than existing state-of-the-art methods while generalizing trajectories more than twice as close to both the demonstrations and novel desired poses. They are then combined with our lab's robot imagination method that learns object affordances, illustrating the applicability to learn tool use through physical experiments.	AAAI 2026 Main Conference		Sipu Ruan	Artificial Intelligence	technical paper	free
37	143533	Motion Planning Diffusion: Learning and Adapting Robot Motion Planning With Diffusion Models	The performance of optimization-based robot motion planning algorithms is highly dependent on the initial solutions, commonly obtained by running a sampling-based planner to obtain a collision-free path. However, these methods can be slow in high-dimensional and complex scenes and produce nonsmooth solutions. Given previously solved path-planning problems, it is highly desirable to learn their distribution and use it as a prior for new similar problems. Several works propose utilizing this prior to bootstrap the motion planning problem, either by sampling initial solutions from it, or using its distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we introduce motion planning diffusion (MPD), an algorithm that learns trajectory distribution priors with diffusion models. These generative models have shown increasing success in encoding multimodal data and have desirable properties for gradient-based motion planning, such as cost guidance. Given a motion planning problem, we construct a cost function and sample from the posterior distribution using the learned prior combined with the cost function gradients during the denoising process. Instead of learning the prior on all trajectory waypoints, we propose learning a lower dimensional representation of a trajectory using linear motion primitives, particularly B-spline curves. This parametrization guarantees that the generated trajectory is smooth, can be interpolated at higher frequencies, and needs fewer parameters than a dense waypoint representation. We demonstrate the results of our method ranging from simple 2-D to more complex tasks using a 7-DOF robot arm manipulator. In addition to learning from simulated data, we also use human demonstrations on a real-world pick-and-place task. The experiment results show that diffusion models are strong priors for encoding multimodal trajectory distributions for optimization-based motion planning.	AAAI 2026 Main Conference		Joao Carvalho	Artificial Intelligence	technical paper	free
38	143532	Generative Graphical Inverse Kinematics	Quickly and reliably finding accurate inverse kinematics (IK) solutions is a challenging problem for many robot manipulators. Existing numerical solvers are widely applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objectives. Recent learning-based approaches that approximate the entire feasible set of solutions have shown promise in generating multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this key shortcoming, we propose a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the generalizability of graph neural networks (GNNs). Our approach, which we call generative graphical IK (GGIK), is the first learned IK solver that is able to efficiently yield a large number of diverse solutions in parallel while also displaying the ability to generalize---a single learned model can be used to produce IK solutions for a variety of different robots. When compared to several other learned IK methods, GGIK provides more accurate solutions with the same amount of training data. GGIK can also generalize reasonably well to robot manipulators unseen during training. In addition, GGIK is able to learn a constrained distribution that encodes joint limits and scales well with the number of robot joints and sampled solutions. Finally, GGIK can be used to complement local IK solvers by providing a reliable initialization for the local optimization process.	AAAI 2026 Main Conference		Oliver Limoyo	Artificial Intelligence	technical paper	free
39	143531	Path-Constrained Haptic Motion Guidance via Adaptive Phase-Based Admittance Control	Robots have surpassed humans in terms of strength and precision, yet humans retain an unparalleled ability for decision-making in the face of unpredictable disturbances. This article aims to combine the strengths of both entities within a singular task: human motion guidance under strict geometric constraints, particularly adhering to predetermined paths. To tackle this challenge, a modular haptic guidance law is proposed that takes the human-applied wrench as an input. Using an auxiliary variable called phase, the generated desired motion is guaranteed to consistently adhere to the constraint path. It is demonstrated how the guidance policy can be generalized into physically interpretable terms, adjustable either prior to initiating the task or dynamically while the task is in progress. Additionally, an illustrative guidance adaptation policy is showcased that takes into account the human’s manipulability. Leveraging passivity analysis, potential sources of instability are pinpointed, and subsequently, overall system stability is ensured by incorporating an augmented virtual energy tank. Lastly, a comprehensive set of experiments, including a 20-participant user study, explores various aspects of the approach in practice, encompassing both technical and usability considerations.	AAAI 2026 Main Conference		Erfan Shahriari	Artificial Intelligence	technical paper	free
40	143529	Multimodal Super-Resolution: Discovering Hidden Physics and Its Application to Fusion Plasmas	Understanding complex physical systems often requires integrating data from multiple diagnostics, each with limited resolution or coverage. We present a machine learning framework that reconstructs synthetic high-temporal-resolution data for a target diagnostic using information from other diagnostics, without direct target measurements during the inference. This multimodal super-resolution technique improves diagnostic robustness and enables monitoring even in case of measurement failures or degradation. Applied to fusion plasmas, our method targets edge-localized modes (ELMs), which can damage plasma-facing materials. By reconstructing super-resolution Thomson Scattering data from complementary diagnostics, we uncover fine-scale plasma dynamics and validate the role of resonant magnetic perturbations (RMPs) in ELM suppression through magnetic island formation. The approach provides new observation supporting the plasma profile flattening due to these islands. Our results demonstrate the framework’s ability to generate high-fidelity synthetic diagnostics, offering a powerful tool for ELM control development in future reactors like ITER. The approach is broadly transferable to other domains facing sparse, incomplete, or degraded diagnostic data, opening new avenues for discovery.	AAAI 2026 Main Conference		Azarakhsh Jalalvand	Artificial Intelligence	technical paper	free
41	143527	CODEI: Resource-Efficient Task-Driven Co-Design of Perception and Decision Making for Mobile Robots Applied to Autonomous Vehicles	This paper discusses the integration challenges and strategies for designing mobile robots, by focusing on the task-driven, optimal selection of hardware and software to balance safety, efficiency, and minimal usage of resources such as costs, energy, computational requirements, and weight. We emphasize the interplay between perception and motion planning in decision-making by introducing the concept of occupancy queries to quantify the perception requirements for sampling-based motion planners. Sensor and algorithm performance are evaluated using False Negative Rate (FNR) and False Positive Rate (FPR) across various factors such as geometric relationships, object properties, sensor resolution, and environmental conditions. By integrating perception requirements with perception performance, an Integer Linear Programming (ILP) approach is proposed for efficient sensor and algorithm selection and placement. This forms the basis for a co-design optimization that includes the robot body, motion planner, perception pipeline, and computing unit. We refer to this framework for solving the co-design problem of mobile robots as CODEI, short for Co-design of Embodied Intelligence. A case study on developing an Autonomous Vehicle (AV) for urban scenarios provides actionable information for designers, and shows that complex tasks escalate resource demands, with task performance affecting choices of the autonomy stack. The study demonstrates that resource prioritization influences sensor choice: cameras are preferred for cost-effective and lightweight designs, while lidar sensors are chosen for better energy and computational efficiency.	AAAI 2026 Main Conference		Dejan Milojevic	Artificial Intelligence	technical paper	free
42	143526	Analysing Satellite Imagery Classification under Spatial Domain Shift across Geographic Regions	Deep learning models are designed based on the i.i.d. assumption; consequently, they experience a significant performance drop due to the distribution shifts when deployed in real environments. Domain Generalisation (DG) aims to bridge the distribution shift between the source and target domains by improving the generalisability of the model to Out-Of-Distribution (OOD) data. This challenge is prominent in satellite imagery classification due to the scarcity of data from underrepresented regions such as Africa and Oceania. In this paper, we address the limitations of existing datasets in capturing distribution shifts caused by geospatial differences between geographic regions by constructing a new, large-scale dataset called Domain Shift across Geographic Regions (DSGR). This dataset aims to help researchers better understand the impact of distribution shifts on satellite imagery classification. Furthermore, we perform rigorous experiments on DSGR to investigate and benchmark the robustness of existing DG techniques under single- and multi-source domain settings and the role of foundation models in enhancing the DG techniques. Our evaluations reveal that recent DG techniques have a comparable, yet weak, performance on DSGR. However, when combined with a foundation model like CLIP, ERM (introduced in 1999) achieves highly competitive results, surpassing even recent state-of-the-art DG solutions in enhancing the generalisability of deep learning models across different geographic regions. Our dataset and code are available at https://github.com/RWGAI/DSGR.	AAAI 2026 Main Conference		Sara Al-Emadi	Artificial Intelligence	technical paper	free
43	143525	Causal Explanations for Sequential Decision Making	Stochastic sequential decision-making systems — such as Markov decision processes and their variants — are increasingly used in areas such as transportation, healthcare, and communication. However, the ability to explain these systems’ outputs to non-technical end users has not kept pace with their widespread adoption. This paper addresses that gap by extending prior work and presenting a unified framework for generating causal explanations of agent behavior in sequential decision-making settings, grounded in the structural causal model (SCM) paradigm. Our framework supports the generation of multiple, semantically distinct explanations for agent actions — capabilities that were previously unattainable. In addition to introducing a novel taxonomy of explanations for MDPs to guide empirical investigation, we develop both exact and approximate causal inference methods within the SCM framework. We analyze their applicability and derive run-time bounds for each. This leads to the proposed algorithm, MeanRESP, which operates flexibly across a spectrum of approximations tailored to external constraints. We further analyze the sample complexity and error rates of approximate MeanRESP, and provide a detailed comparison of its outputs — under varying definitions of responsibility — with popular Shapley-value-based methods. Empirically, we performed a series of experiments to evaluate the practicality and effectiveness of the proposed system, focusing on real-world computational demands and the validity and reliability of metrics for comparing approximate and exact causal methods. Finally, we present two user studies that reveal user preferences for certain types of explanations and demonstrate a strong preference for explanations generated by our framework compared to those from other state-of-the-art systems.	AAAI 2026 Main Conference		Samer B. Nashed	Artificial Intelligence	technical paper	free
44	143524	Feature Hallucination for Self-supervised Action Recognition	Understanding human actions in videos requires robust integration of multimodal cues beyond raw pixels. This work introduces a deep self-supervised action recognition framework that jointly predicts action concepts and auxiliary features from RGB video, then hallucinates missing modalities at test time to improve recognition without added runtime cost. Two new domain-specific descriptors, Object Detection Features (ODF) and Saliency Detection Features (SDF), are proposed to capture spatial context and motion saliency, integrating them with other modalities such as optical flow, skeleton, audio, and improved dense trajectories. The framework incorporates aleatoric uncertainty modeling to handle noisy or unreliable features, along with a robust loss for stable multimodal fusion. Compatible with popular architectures including I3D, AssembleNet, Video Transformer Network, VideoMAE V2, and InternVideo2, the approach achieves state-of-the-art results on Kinetics-400, Kinetics-600, and Something-Something V2.	AAAI 2026 Main Conference		Lei Wang	Artificial Intelligence	technical paper	free
45	143523	Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training	This paper presents a theoretically grounded optimization framework for neural network training that integrates an Exponentially Decaying Learning Rate with Lyapunov-based stability analysis. We develop a dynamic learning rate algorithm and prove that it induces connected and stable descent paths through the loss landscape by maintaining the connectivity of super-level sets Sλ = {θ ∈ ℝn : ℒ(θ) ≥ λ}. Under the condition that the Lyapunov function V(θ) = ℒ(θ) satisfies Δ V(θ) ⋅ Δ ℒ(θ) ≥ 0, we establish that these super-level sets are not only connected but also equiconnected across epochs, providing uniform topological stability. We further derive convergence guarantees using a second-order Taylor expansion and demonstrate that our exponentially scheduled learning rate with gradient-based modulation leads to a monotonic decrease in loss. The proposed algorithm incorporates this schedule into a stability-aware update mechanism that adapts step sizes based on both curvature and energy-level geometry. This work formalizes the role of topological structure in convergence dynamics and introduces a provably stable optimization algorithm for high-dimensional, non-convex neural networks.	AAAI 2026 Main Conference		Jatin Chaudhary	Artificial Intelligence	technical paper	free
46	143521	GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs	The rapid proliferation of Generative AI (GenAI) into diverse, high-stakes domains necessitates robust and reproducible evaluation methods. However, practitioners often resort to ad-hoc, non-standardized scripts, as common metrics are often unsuitable for specialized, structured outputs (e.g., automated plans, time-series) or holistic comparison across modalities (e.g., text, audio, and image). This fragmentation hinders comparability and slows AI system development. To address this challenge, we present GAICo (Generative AI Comparator): a deployed, open-source Python library that streamlines and standardizes GenAI output comparison. GAICo provides a unified, extensible framework supporting a comprehensive suite of reference-based metrics for unstructured text, specialized structured data formats, and multimedia (images, audio). Its architecture features a high-level API for rapid, end-to-end analysis, from multi-model comparison to visualization and reporting, alongside direct metric access for granular control. We demonstrate GAICo's utility through a detailed case study evaluating and debugging complex, multi-modal AI Travel Assistant pipelines. GAICo empowers AI researchers and developers to efficiently assess system performance, make evaluation reproducible, improve development velocity, and ultimately build more trustworthy AI systems, aligning with the goal of moving faster and safer in AI deployment. Since its release on PyPI in Jun 2025, the tool has been downloaded over 13K times, across versions, by Aug 2025, demonstrating growing community interest.	AAAI 2026 Main Conference		Kausik Lakkaraju, Biplav Srivastava, Nitin Gupta, Pallav Koppisetti	Artificial Intelligence	technical paper	free
47	143520	Deploying Atmospheric and Oceanic AI Models on Chinese Hardware and Framework: Migration Strategies, Performance Optimization and Analysis	With the growing role of artificial intelligence in climate and weather research, efficient model training and inference are in high demand. Current models like FourCastNet and AI-GOMS depend heavily on GPUs, limiting hardware independence, especially for Chinese domestic hardware and frameworks. To address this issue, we present a framework for migrating large-scale atmospheric and oceanic models from PyTorch to MindSpore and optimizing for Chinese chips, and evaluating their performance against GPUs. The framework focuses on software-hardware adaptation, memory optimization, and parallelism. Furthermore, the model's performance is evaluated across multiple metrics, including training speed, inference speed, model accuracy, and energy efficiency, with comparisons against GPU-based implementations. Experimental results demonstrate that the migration and optimization process preserves the models' original accuracy while significantly reducing system dependencies and improving operational efficiency by leveraging Chinese chips as a viable alternative for scientific computing. This work provides valuable insights and practical guidance for leveraging Chinese domestic chips and frameworks in atmospheric and oceanic AI model development, offering a pathway toward greater technological independence.	AAAI 2026 Main Conference		Xiaomeng Huang, Li Jiahao, Jiancheng Pan, Yanfei Xiang, Yuze Sun, Luo Wentao, Quan Zhang	Artificial Intelligence	technical paper	free
48	143519	From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production	Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across task types, applications, and modalities. Yet, evidence of their deployment in production enterprise settings is still limited. This paper reports IBM’s experience developing and deploying the \textbf{Computer Using Generalist Agent (CUGA)}. CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was deployed in the Business-Process-Outsourcing talent acquisition domain, meeting enterprise requirements for scalability, auditability, safety, and governance. To support evaluation, we introduce \textbf{BPO-TA}, a 26-task benchmark spanning 13 analytics endpoints. In deployment, CUGA matched the accuracy of specialized agents while reducing development time by 91.9\% and cost by 52.3\%. Our contribution is twofold: demonstrating generalist agents operating at enterprise scale, and distilling technical and organizational lessons from this deployment. We outline requirements and next steps for advancing research-grade architectures like CUGA into robust, enterprise-ready systems.	AAAI 2026 Main Conference		Segev Shlomov, Lukasz Strak, Eilam Shapira, Alon Oved, Nir Mashkif, Sami Marreed, Ido Levy, Offer Akrabi, Avi Yaeli, Elizabeth Koumpan, Yinon Goldshtein, Asaf Adi	Artificial Intelligence	technical paper	free
49	143518	A Metacognitive Architecture for Correcting LLM Errors in AI Agents	The ability to self-revise is critical for AI agents. To maintain trust and foster positive perceptions, AI systems must correct their mistakes and adapt to users’ changing needs. We present a metacognitive architecture for self-revision in SAMI, an AI social agent deployed in Georgia Tech’s OMSCS program. Over the past ten semesters, SAMI has facilitated social connections for more than 11,000 students. Real-world deployments revealed frequent requests from students to revise the knowledge database, either to correct errors or to update their information. To address this need, we present a self-revision architecture that integrates Knowledge-Based AI (KBAI) and Generative AI (GenAI). The architecture (1) localizes the task requiring revision by introspecting on its self-model, (2) updates the knowledge database, and (3) communicates the revision process back to the user. We evaluate the framework using feedback cases derived from real student data and observed revision needs. This work introduces a novel metacognitive approach to improving explainability through the integration of KBAI and GenAI, with a clear path toward real-world deployment.	AAAI 2026 Main Conference		Ashok Goel, Jisu Kim, Mahimul Islam	Artificial Intelligence	technical paper	free
50	143517	Interpretable Machine Learning for In-Home Mild Cognitive Impairment Detection	This paper introduces a novel system for in-home cognitive health assessment using ambient sensors and a machine learning technology that can robustly detect mild cognitive impairment (MCI) despite its noisy and sparsely limited available data. The learned model can transparently explain the aspects of individuals' daily lives led to the prediction, while reliably predicting MCI, providing more insights to healthcare workers for further clinical interventions. We developed the robust transparent machine learning model, based on fusion adaptive resonance theory (Fusion ART) neural network to learn individuals' daily patterns of activity from continuous sensor data in terms of a suite of digital biomarkers reflecting four key domains: physical activity, daily routines, cognitive engagement, and sleep patterns. Based on a longitudinal study of over one hundred participants, deployed with non-intrusive sensors in their homes to undergo parallel clinical evaluation across a period of five years, our model successfully identified individuals with MCI, achieving high predictive accuracy regardless the noisy and sparse availability of data. As a transparent neural network, the learned model can also serve as classification rules to distinguish MCI from normal cognition (NC) cases based on the digital biomarkers. These results demonstrate that passively collected, sensor-derived digital biomarkers can be leveraged to indicate cognitive status and potentially providing clinically meaningful insights on the impairment conditions. We also discuss the practical challenges and lessons learned from this real-world deployment to inform future large-scale implementations of such AI-driven health monitoring systems.	AAAI 2026 Main Conference		Budhitama Subagdja, Ah-Hwee Tan, Shanthoshigaa D, Iris Rawtaer	Artificial Intelligence	technical paper	free
51	143516	InfrastructureSentinel: Policy Enforced Guardrails for Secure MCP-driven Infrastructure Agents	The proliferation of Model Context Protocol (MCP) servers in enterprise infrastructure management has revolutionized AI-driven automation while introducing critical multi-layered security vulnerabilities that traditional cybersecurity frameworks cannot adequately address. This paper presents a comprehensive intelligent guardrail system that addresses the unique security challenges of MCP-driven infrastructure management through a novel four-layer defense architecture. Our solution employs a dedicated guardian LLM that interprets natural language policies and applies contextual reasoning to complex infrastructure scenarios, providing dynamic policy enforcement that adapts to user roles, operational timing, and system context. Unlike existing rule-based security systems, our approach implements guardrails at four distinct control points: input message filtering, tool selection validation, execution-time verification, and post-action auditing. The system addresses critical gaps in existing security solutions by providing infrastructure-specific threat modeling, real-time policy adaptation, and comprehensive audit trails with explainable decision-making through confidence scores and detailed reasoning. Our evaluation demonstrates the system's effectiveness in preventing command injection, privilege escalation, and tool poisoning attacks across various enterprise infrastructure scenarios while maintaining operational agility essential for modern data center management.	AAAI 2026 Main Conference		Gayathri Saranathan, Tarun Kumar, Martin Foltin, Suparna Bhattacharya, Aalap Tripathy, Scott Hinchley, Donald Bahls, David Brookshire, Larry Kaplan, Robert Wisniewski	Artificial Intelligence	poster	free
52	143515	AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture	Underground pipeline leaks and infiltrations pose significant threats to water security and environmental safety. Traditional manual inspection methods provide limited coverage and delayed response, often missing critical anomalies. This paper proposes AquaSentinel, a novel physics-informed AI system for real-time anomaly detection in urban underground water pipeline networks. We introduce four key innovations: (1) strategic sparse sensor deployment at high-centrality nodes combined with physics-based state augmentation to achieve network-wide observability from minimal infrastructure; (2) the RTCA (Real-Time Cumulative Anomaly) detection algorithm, which employs dual-threshold monitoring with adaptive statistics to distinguish transient fluctuations from genuine anomalies; (3) a Mixture of Experts (MoE) ensemble of spatiotemporal graph neural networks that provides robust predictions by dynamically weighting model contributions; (4) causal flow-based leak localization that traces anomalies upstream to identify source nodes and affected pipe segments. Our system strategically deploys sensors at critical network junctions and leverages physics-based modeling to propagate measurements to unmonitored nodes, creating virtual sensors that enhance data availability across the entire network. Experimental evaluation using 110 leak scenarios demonstrates that AquaSentinel achieves 100% detection accuracy. This work advances pipeline monitoring by demonstrating that physics-informed sparse sensing can match the performance of dense deployments at a fraction of the cost, providing a practical solution for aging urban infrastructure.	AAAI 2026 Main Conference		Wenlu Wang, Hua Zhang, Qiming Guo, Bishal Khatri, Wenbo Sun, Jinwen Tang	Artificial Intelligence	technical paper	free
53	143512	Centralized training with hybrid execution in multi-agent reinforcement learning via predictive observation imputation	We study hybrid execution in multi-agent reinforcement learning (MARL), a paradigm where agents aim to complete cooperative tasks with arbitrary communication levels at execution time by taking advantage of information-sharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized), but the agents do not know beforehand which communication level they will encounter at execution time. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations at execution time. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the impact of partial observability in MARL. Experimental results show that our method consistently outperforms relevant baselines, allowing agents to act with faulty communication while successfully exploiting shared information.	AAAI 2026 Main Conference		Pedro Santos	Artificial Intelligence	technical paper	free
54	143511	Generative AI Against Poaching: Latent Composite Flow Matching for Poaching Prediction	Poaching poses significant threats to wildlife and biodiversity. A valuable step in reducing poaching is to forecast poacher behavior, which can inform patrol planning and other conservation interventions. Existing poaching prediction methods based on linear models or decision trees lack the expressivity to capture complex, nonlinear spatiotemporal patterns. Recent advances in generative modeling, particularly flow matching, offer a more flexible alternative. However, training such models on real-world poaching data faces two central obstacles: imperfect detection of poaching events and limited data. To address imperfect detection, we integrate flow matching with an occupancy-based detection model and train the flow in latent space to infer the underlying occupancy state. To mitigate data scarcity, we adopt a composite flow initialized from a linear-model prediction rather than random noise which is the standard in diffusion models, injecting prior knowledge and improving generalization. Evaluations on datasets from two national parks in Uganda show consistent gains in predictive accuracy.	AAAI 2026 Main Conference		Milind Tambe, Lily Xu, Lingkai Kong, Haichuan Wang, Charles Emogor, Vincent Boersch-Supan	Artificial Intelligence	technical paper	free
55	143510	Life, Machine Learning, and the Search for Habitability: Predicting Biosignature Fluxes for the Habitable Worlds Observatory	Future direct-imaging flagship missions, such as NASA's Habitable Worlds Observatory (HWO), face critical decisions in prioritizing observations due to extremely stringent time and resource constraints. In this paper, we introduce two advanced machine-learning architectures tailored for predicting biosignature gas fluxes from exoplanetary reflected-light spectra: a Bayesian Convolutional Neural Network (BCNN) and our novel model architecture, the Spectral Query Adaptive Transformer (SQuAT). The BCNN robustly quantifies both epistemic and aleatoric uncertainties, offering reliable predictions under diverse observational conditions, whereas SQuAT employs query-driven attention mechanisms to enhance interpretability by explicitly associating spectral features with specific biosignature gases. We demonstrate that both models achieve comparably high predictive accuracy on an augmented dataset spanning a wide range of exoplanetary conditions, while highlighting their distinct advantages in uncertainty quantification and spectral interpretability. These capabilities position our methods as promising tools for accelerating target triage, optimizing observation schedules, and maximizing scientific return for upcoming flagship missions such as HWO.	AAAI 2026 Main Conference		Mark Moussa, Amber Young, Brianna Isola, Vasuda Trehan, Nicholas Wogan, Michael Himes, Giada Arney	Artificial Intelligence	technical paper	free
56	143509	Deploying Rapid Damage Assessments from sUAS Imagery for Disaster Response	This paper presents the first AI/ML system for automating building damage assessment in uncrewed aerial systems (sUAS) imagery to be deployed operationally during federally declared disasters (Hurricanes Debby and Helene). In response to major disasters, sUAS teams are dispatched to collect imagery of the affected areas to assess damage; however, at recent disasters, teams collectively delivered between 47GB and 369GB of imagery per day, representing more imagery than can reasonably be transmitted or interpreted by subject matter experts in the disaster scene, thus delaying response efforts. To alleviate this data avalanche encountered in practice, computer vision and machine learning techniques are necessary. While prior work has been deployed to automatically assess damage in satellite imagery, there is no current state of practice for sUAS-based damage assessment systems for operational use, as all known work has been confined to academic settings. This work establishes the state of practice via the development and deployment of models for building damage assessment with sUAS imagery. The development of the models consisted of training on the largest known dataset of post-disaster sUAS aerial imagery, which consists of 21,716 building damage labels, and the operational training of 91 disaster practitioners. The deployment of the system was during the responses to Hurricanes Debby and Helene, where it assessed a combined 415 buildings in approximately 18 minutes. This work contributes detailed documentation of the actual use of AI/ML for damage assessment during a disaster and lessons learned to the benefit of the AI/ML research and user communities.	AAAI 2026 Main Conference		Thomas Manzini, Priyankari Perali, Robin Murphy	Artificial Intelligence	technical paper	free
57	143508	Clinician-in-the-Loop Smart Home System to Detect Urinary Tract Infection Flare-Ups via Uncertainty-Aware Decision Support	Urinary tract infection (UTI) flare-ups pose a significant health risk for older adults with chronic conditions. These infections often go unnoticed until they become severe, making early detection through innovative smart home technologies crucial. Traditional machine learning (ML) approaches relying on simple binary classification for UTI detection offer limited utility to nurses and practitioners as they lack insight into prediction uncertainty, hindering informed clinical decision-making. This paper presents a clinician-in-the-loop (CIL) smart home system that leverages ambient sensor data to extract meaningful behavioral markers, train robust predictive ML models, and calibrate them to enable uncertainty-aware decision support. The system incorporates a statistically valid uncertainty quantification method called Conformal-Calibrated Interval (CCI), which quantifies uncertainty and abstains from making predictions ("I don’t know") when the ML model's confidence is low. Evaluated on real-world data from eight smart homes, our method outperforms baseline methods in recall and other classification metrics while maintaining the lowest abstention proportion and interval width. A survey of 42 nurses confirms that our system's outputs are valuable for guiding clinical decision-making, underscoring their practical utility in improving informed decisions and effectively managing UTIs and other condition flare-ups in older adults.	AAAI 2026 Main Conference		Jana Doppa, Chibuike Ugwu, Roschelle Fritz, Diane Cook	Artificial Intelligence	poster	free
58	143507	Driving Engagement in Daily Fantasy Sports with a Scalable and Urgency-Aware Ranking Engine	In daily fantasy sports (DFS), match participation is highly time-sensitive. Users must act within a narrow window before a game begins, making match recommendation a time-critical task to prevent missed engagement and revenue loss. Existing recommender systems, typically designed for static item catalogs, are ill-equipped to handle the hard temporal deadlines inherent in these live events. To address this, we designed and deployed a recommendation engine using the Deep Interest Network (DIN) architecture. We adapt the DIN architecture by injecting temporality at two levels: first, through real-time urgency features for each candidate match (e.g., time-to-round-lock), and second, via temporal positional encodings that represent the time-gap between each historical interaction and the current recommendation request, allowing the model to dynamically weigh the recency of past actions. This approach, combined with a listwise neuralNDCG loss function, produces highly relevant and urgency-aware rankings. To support this at industrial scale, we developed a multi-node, multi-GPU training architecture on Ray and PyTorch. Our system, validated on a massive industrial dataset with over 650k users and over 100B interactions, achieves a +9\% lift in nDCG@1 over a heavily optimized LightGBM baseline with handcrafted features. The strong offline performance of this model establishes its viability as a core component for our planned on-device (edge) recommendation system, where online A/B testing will be conducted.	AAAI 2026 Main Conference		Unmesh Padalkar	Artificial Intelligence	technical paper	free
59	143506	Automated Unified Reasoning with Vision-Language Models for Multi-modal Burn Assessment	In emerging clinical applications such as ultrasound-based burn assessment, the lack of domain-specific data presents a significant challenge for developing robust AI systems. Vision-language models (VLMs) have shown strong performance in general computer vision tasks, yet their application to medical imaging remains limited, particularly due to insufficient reasoning capabilities and the scarcity of high-quality training data. We introduce AURA (Automated Unified Reasoning for Burn Assessment), a multi-modal approach that integrates pre-trained VLMs with symbolic first-order logic (FOL) reasoning to improve diagnostic accuracy and interpretability in this data-limited setting. For this study, we collected real-patient data over a one-year period at a U.S. burn center, performing all experiments in a real clinical setting to ensure practical relevance. The dataset includes both conventional B-Mode ultrasound and Tissue Doppler Imaging (TDI), with TDI introduced here for the first time in burn assessment, underscoring the emerging nature of this work. Beyond burn severity classification, we assess the system’s ability to produce expert-level surgical insight directly from imaging data. On the retrospective dataset, it achieves up to 93% accuracy in surgical classification and 87% in fine-grained burn depth prediction, comparable to expert-informed predictions and substantially exceeding the 70% accuracy of traditional visual inspection by human experts. These results, obtained from a novel multi-modal dataset collected in a real clinical burn center setting, highlight the potential of this approach to improve decision-making in burn care. To further support future deployment, we demonstrate a prototype integration with an Electronic Medical Record (EMR) system that aligns with clinical workflows and supports scalable, real-world implementation.	AAAI 2026 Main Conference		Juan Wachs, Md Masudur Rahman, Mohamed Masry, Gayle Gordillo	Artificial Intelligence	technical paper	free
60	143505	DEEP: A Discourse Evolution Engine for Predictions About Social Movements	Numerous social movements (SMs) around the world help support the UN's Sustainable Development Goals (SDGs). Understanding how key events shape SMs is key to the achievement of the SDGs. We have developed SMART (Social Media Analysis \& Reasoning Tool) to track social movements related to the SDGs. SMART was designed by a multidisciplinary team of AI researchers, journalists, communications scholars and legal experts. This paper describes SMART's transformer-based multivariate time series Discourse Evolution Engine for Predictions about Social Movements (DEEP) to predict the volume of future articles/posts and the emotions expressed. DEEP outputs probabilistic forecasts with uncertainty estimates, providing critical support for editorial planning and strategic decision-making. We evaluate DEEP with a case study of the \emph{\#MeToo} movement by creating a novel longitudinal dataset (433K Reddit posts and 121K news articles) from September 2024 to June 2025 that will be publicly released for research purposes upon publication of this paper.	AAAI 2026 Main Conference		Aaron Shaw, Venkatramanan Subrahmanian, Marco Postiglione, Valerio La Gatta, Jeremy Gilbert, Daniel Linna, Morgan Greenfield	Artificial Intelligence	technical paper	free
61	143504	Lightweight Additive Blend Maps for Texture-Preserving Face Retouching: A Neural Approach to Traditional Photographic Techniques	Professional photography face retouching requires achieving a balance between texture preservation and quality increase, a problem that conventional automated methods find difficult to effectively handle. We provide a new lightweight neural architecture that converts conventional dodge-and-burn photography methods into predictions for learnable additive blend maps. Instead of rebuilding whole images, our method uses a small U-Net that predicts pixel-level changes, allowing for exact brightness adjustments while maintaining the original skin texture. With a 6MB model that operates effectively on common hardware, the technique produces high-quality results while preserving texture fidelity, which is crucial for professional applications. Experimental validation offers significant computing advantages while demonstrating competitive performance with current methods.	AAAI 2026 Main Conference		Abinash Pegu	Artificial Intelligence	technical paper	free
62	143503	LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation	The rapid growth of scientific publications has made it increasingly difficult to keep literature reviews comprehensive and up-to-date. Though prior work has focused on automating retrieval and screening, the writing phase of systematic reviews remains largely under-explored, especially with regard to readability and factual accuracy. To address this, we present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process. LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles. Evaluated on SciReviewGen and a proprietary ScienceDirect dataset, LiRA outperforms current baselines such as AutoSurvey and MASS-Survey in writing and citation quality, while maintaining competitive similarity to human-written reviews. We further evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation. Our findings highlight the potential of agentic LLM workflows, even without domain-specific tuning, to improve the reliability and usability of automated scientific writing.	AAAI 2026 Main Conference		Anders Søgaard, Maarten de Rijke, Seyed Amin Tabatabaei, Xinyi Chen, Gregory Hok Tjoan Go, Khang Ly	Artificial Intelligence	technical paper	free
63	143502	Integrating Fourier Neural Operators into High-Fidelity Helicopter Flight Simulation for Real-Time Urban Wind Prediction	High-fidelity helicopter flight simulators are essential for preparing pilots for complex and hazardous environments, yet realistic urban wind dynamics are difficult to reproduce in real time when relying on precomputed computational fluid dynamics (CFD) data. We present the first integration of a Fourier Neural Operator (FNO) into a Level D full flight simulator for real-time, physics-based urban wind field generation. Trained on high-resolution urban flow simulations, the FNO predicts one-minute-averaged 3D wind fields that dynamically adapt to flight state and location, replacing static wind inputs in the simulator pipeline. Turbulence levels are computed from the predictions and injected directly into the simulation loop. Professional pilots evaluated the system in an urban scenario and reported that it reproduced wind effects they would expect, such as turbulence and directional changes when landing behind buildings. They highlighted its value for less experienced pilots to develop wind awareness and for realistic training in critical operations, including offshore platform landings.	AAAI 2026 Main Conference		Maximilian Dauner, Michael Kurz, Gudrun Socher, Alexander Knoll	Artificial Intelligence	technical paper	free
64	143501	NOVAID: Natural-language Observability Visualization Assistant for ITOps Dashboard Widget Generation	Manual creation of IT monitoring dashboard widgets is slow, error-prone, and a barrier for both novice and expert users. We present NOVAID, an interactive chatbot that leverages Large Language Models (LLMs) to generate IT monitoring widgets directly from natural language queries. Unlike general natural language–to-visualization tools, NOVAID addresses IT operations–specific challenges: specialized widget types like SLO charts, dynamic API-driven data retrieval, and complex contextual filters. The system combines a domain-aware semantic parser, fuzzy entity matching, and schema completion to produce standardized widget JSON specifications. An interactive clarification loop ensures accuracy in underspecified queries. On a curated dataset of 271 realistic queries, NOVAID achieves promising accuracy (up to 94.10% in metric extraction) across multiple LLMs. A user study with IT engineers yielded a System Usability Scale score of 74.2 for NOVAID, indicating good usability. By bridging natural language intent with operational dashboards, NOVAID demonstrates clear potential and a path for deployment in enterprise ITOps monitoring platforms.	AAAI 2026 Main Conference		Prateeti Mohapatra, Seema Nagar, Arthur de Magalhaes, Pratik Mishra, Caner Gözübüyük, Raya Wittich	Artificial Intelligence	technical paper	free
65	143500	CausalTrace: A Neurosymbolic Causal Analysis Agent for Smart Manufacturing	Modern manufacturing environments demand not only accurate predictions but also interpretable insights to process anomalies, root causes, and potential interventions. Existing AI systems often function as isolated black boxes, lacking the seamless integration of prediction, explanation, and causal reasoning required for a unified decision-support solution. This fragmentation limits their trustworthiness and practical utility in high-stakes industrial environments. In this work, we present CausalTrace, a neurosymbolic causal analysis module integrated into the SmartPilot industrial CoPilot. CausalTrace performs data-driven causal analysis enriched by industrial ontologies and knowledge graphs, including advanced functions such as causal discovery, counterfactual reasoning, and root cause analysis (RCA). It supports real-time operator interaction and is designed to complement existing agents by offering transparent, explainable decision support. We conducted a comprehensive evaluation of CausalTrace using multiple causal assessment methods and the C3AN framework (i.e. Custom, Compact, Composite AI with Neurosymbolic Integration), which spans principles of robustness, intelligence, and trustworthiness. In an academic rocket assembly testbed, CausalTrace achieved substantial agreement with domain experts (ROUGE-1: 0.91 in ontology QA) and strong RCA performance (MAP@3: 94%, PR@2: 97%, MRR: 0.92, Jaccard: 0.92). It also attained 4.59/5 in the C3AN evaluation, demonstrating precision and reliability for live deployment.	AAAI 2026 Main Conference		Amit Sheth, Utkarshani Jaimini, Chathurangi Shyalika Jayakody Kankanamalage, Aryaman Sharma, Cory Henson, Fadi Kalach, Ramy Harik	Artificial Intelligence	technical paper	free
66	143499	Diversity Meets Relevancy: Multi-Agent Knowledge Probing for Industry 4.0 Applications	Industrial data scientists modeling an asset's condition need to build domain understanding by asking questions about a given asset. Some example asset questions are what failure modes can it experience, under which operating conditions, and how the manufacturer and weather affect. Traditionally, the main source of domain information comes from Subject Matter Experts (SMEs) and Failure Modes and Effects Analysis (FMEA) documents which are not always available and may not be detailed enough to cover different external factors (e.g., operating mode, manufacturer, weather). Now that Large Language Models (LLMs) have became a commodity, this gives us a big opportunity to leverage them to bridge this gap. Inspired by other's work on LLM knowledge probing, we present a Multi-Agent System (MAS) specialized on aiding industrial data scientists guide their modeling decisions. One challenge we address is the generated linguistic diversity and question relevance, which we optimize by using popular information diversity metrics and a grounded relevancy classifier. We continuously monitor the set of newly generated instruction sets at the end of each round, compare the linguistic diversity against common baselines and show high generated knowledge coverage on the downstream FMEA task. We also conduct user studies to validate the quality of the questions. We finally present the real-world implications of providing diverse asset specific information to aid data scientist's modeling decisions through our deployed MAS. Through the deployed system, we show its generalizability to different assets and extendibility to more downstream tasks like work order scheduling, failure mode sensor analysis and machine learning model recipes generation.	AAAI 2026 Main Conference		Dhaval Patel, Christodoulos Constantinides, Scott Kimbleton, Nishu Garg, Muhammad Paracha	Artificial Intelligence	technical paper	free
67	143498	Automated Creation and Enrichment Framework for Improved Invocation of Enterprise APIs as Tools	Recent advancements in Large Language Models (LLMs) has lead to the development of agents capable of complex reasoning and interaction with external tools. In enterprise contexts, the effective use of such tools that are often enabled by application programming interfaces (APIs) is hindered by poor documentation, complex input or output schema, and large number of operations. These challenges make tool selection difficult and reduce the accuracy of payload formation upto 25%. We propose ACE, an automated tool creation and enrichment framework that transforms enterprise APIs into LLM-compatible tools. ACE (i) generates enriched tool specifications with parameter descriptions and examples to improve selection and invocation accuracy, and (ii) incorporates a dynamic shortlisting mechanism that filters relevant tools at runtime, reducing prompt complexity while maintaining scalability. We validate our framework on both proprietary and open-source APIs and demonstrate its integration with agentic frameworks. To the best of our knowledge, ACE is the first end-to-end framework that automates the creation, enrichment, and dynamic selection of enterprise API tools for LLM agents.	AAAI 2026 Main Conference		Himanshu Gupta, Sameep Mehta, Prerna Agarwal, Renuka Sindhgatta, Soujanya Soni, Rohith Vallam	Artificial Intelligence	technical paper	free
68	143497	Digital Scale: Open-Source On-Device BMI Estimation from Smartphone Camera Images Trained on a Large-Scale Real-World Dataset	Estimating Body Mass Index (BMI) from camera images with machine learning models enables rapid weight assessment when traditional methods are unavailable or impractical, such as in telehealth or emergency scenarios. Existing computer vision approaches have been limited to datasets of up to 14,500 images. In this study, we present a deep learning-based BMI estimation method trained on our WayBED dataset, a large proprietary collection of 84,963 smartphone images from 25,353 individuals. We introduce an automatic filtering method that uses posture clustering and person detection to curate the dataset by removing low-quality images, such as those with atypical postures or incomplete views. This process retained 71,322 high-quality images suitable for training. We achieve a Mean Absolute Percentage Error (MAPE) of 7.9% on our hold-out test set (WayBED data) using full-body images, the lowest value in the published literature to the best of our knowledge. Further, we achieve a MAPE of 13% on the completely unseen (during training) VisualBodyToBMI dataset, comparable with state-of-the-art approaches trained on it, demonstrating robust generalization. Lastly, we fine-tune our model on VisualBodyToBMI and achieve a MAPE of 8.56%, the lowest reported value on this dataset so far. We deploy the full pipeline, including image filtering and BMI estimation, on Android devices using the CLAID framework. We release our complete code for model training, filtering, and the CLAID package for mobile deployment as open-source contributions.	AAAI 2026 Main Conference		Frederik Manichand, Robin Deuber, Robert Jakob, Steve Swerling, Jamie Rosen, Elgar Fleisch, Patrick Langer	Artificial Intelligence	poster	free
69	143496	AdaptJobRec: Enhancing Conversational Career Recommendation Through an LLM-Powered Agentic System	In recent years, recommendation systems have evolved from providing a single list of recommendations to offering a comprehensive suite of topic-focused services. To better accomplish this task, conversational recommendation systems (CRS) have progressed from basic retrieval-augmented LLM generation to agentic systems with advanced reasoning and self-correction capabilities. However, agentic systems come with notable response latency—a longstanding challenge for conversational recommendation systems. To balance the trade-off between handling complex queries and minimizing latency, we propose AdaptJobRec, the first conversational job recommendation system that leverages autonomous agent to integrate personalized recommendation algorithm tools. The system employs a user query complexity identification mechanism to minimize response latency. For straightforward queries, the agent directly selects the appropriate tool for rapid responses. For complex queries, the agent uses the memory processing module to filter chat history for relevant content, then passes the results to the intelligent task decomposition planner, and finally executes the tasks using personalized recommendation tools. Evaluation on Walmart’s real-world career recommendation scenarios demonstrates that AdaptJobRec reduces average response latency by up to 53.3\% compared to competitive baselines, while significantly improving recommendation accuracy.	AAAI 2026 Main Conference		Xintao Wu, Qixin Wang, Dawei Wang, Kun Chen, Yaowei Hu, Puneet Girdhar, Ruoteng Wang, Aadesh Gupta, Chaitanya Devella, Wenlai Guo, Shangwen Huang, Bachir Aoun, Greg Hayworth, Han Li	Artificial Intelligence	poster	free
70	143494	Transferable RL for Real-World Navigation Using Semantic Segmentation and Bird’s-Eye View Abstraction	Reinforcement Learning (RL) has shown significant promise in developing autonomous navigation algorithms for complex environments. However, the direct application of RL policies trained in simulation to real-world scenarios often faces challenges due to the reality gap. This paper proposes a two-stage system incorporating a segmentation strategy and a bird’s-eye-view (BEV) representation to mitigate the domain gap between simulation and reality. In the first stage, the segmentation transforms sensor data into a simplified and interpretable representation of the surrounding area, facilitating transferability across different deployments. In the second stage, the agent navigates through the BEV map, which can be trained using a vectorized simulation environment---a setup that runs multiple parallel instances of the environment to provide a wide range of training scenarios. This vectorization enables rapid exposure to varied environmental conditions, thereby accelerating and diversifying the training of a deep RL agent to achieve optimal navigation behaviors while maintaining high-speed, in-bound trajectories. The segmentation is crucial because it supports generalization of the learned policy across different robotic platforms. The contribution of this paper lies in combining real-time semantic segmentation with a bird’s-eye-view navigation policy, resulting in a transferable and scalable framework for real-world deployment of RL-based navigation agents. Experimental results demonstrate that agents trained with this methodology exhibit robust navigation performance and adaptability in both simulated and real-world environments, validating the efficacy of combining vectorized simulation with real-world segmentation for practical robotic navigation.	AAAI 2026 Main Conference		Benedikt Schlereth-Groh, Sakir Yöndem, Ramin Kolagari	Artificial Intelligence	technical paper	free
71	143493	Building Domain-Specific Small Language Models via Guided Data Generation	Large Language Models (LLMs) have shown remarkable success in supporting a wide range of knowledge-intensive tasks. In specialized domains, there is growing interest in leveraging LLMs to assist subject matter experts with domain-specific challenges. However, deploying LLMs as SaaS solutions raises data privacy concerns, while many open-source models demand significant computational resources for effective domain adaptation and deployment. A promising alternative is to develop smaller, domain-specialized LLMs, though this approach is often constrained by the lack of high-quality domain-specific training data. In this work, we address these limitations by presenting a cost-efficient and scalable training pipeline that combines guided synthetic data generation from a small seed corpus with bottom-up domain data curation. Our pipeline integrates Domain-Adaptive Pre-training (DAPT), Domain-specific Supervised Fine-Tuning (DSFT), and Direct Preference Optimization (DPO) to train effective small-scale models for specialized use cases. We demonstrate this approach through DiagnosticSLM, a 3B-parameter language model tailored for fault diagnosis, root cause analysis, and repair recommendation in industrial settings. To evaluate model performance, we introduce four domain-specific benchmarks: multiple-choice questions (DiagnosticMCQ), question answering (DiagnosticQA), sentence completion (DiagnosticComp), and summarization (DiagnosticSum). DiagnosticSLM achieves up to 25% accuracy improvement over open-source models of comparable or larger size (2B-9B) on the MCQ task, while also outperforming or matching them in other tasks, demonstrating strong domain-specific reasoning and generalization capabilities.	AAAI 2026 Main Conference		Chetan Gupta, AMAN KUMAR, Lasitha Vidyaratne, Ekant Amin, Xian Lee, Ahmed Farahat, Yuta Koreeda	Artificial Intelligence	poster	free
72	143492	Trauma THOMPSON: A Dataset and Realistic Generative Framework for AI Copilots in Emergency Care	We introduce Trauma THOMPSON, a dataset and suite of benchmarks designed to accelerate the development of AI-powered copilots for real-time decision-making in emergency and resource-limited medical settings. This work proposes a method to address a critical bottleneck for future deployment: models trained on simulation may not work well in the real world. The dataset features 3,717 unscripted, first-person video clips of five emergency procedures, uniquely including "just-in-time" (JIT) interventions that mirror the improvisational nature of field medicine. To obtain realistic patient data without ethical issues and identity concerns that medical data often counter, we also propose TraumaGen, a novel framework for generating photorealistic patient and wound images from manikins while preserving clinical context. We establish benchmarks for action recognition, anticipation, and visual question answering (VQA), evaluating state-of-the-art models to demonstrate the challenges and potential of our dataset. By focusing on realism and improvisation, Trauma THOMPSON provides a crucial resource and a clear path toward developing and validating robust AI assistants for future deployment in real-world emergency care. The dataset is available at https://anonymous.4open.science/r/dataset-58F3.	AAAI 2026 Main Conference		Juan Wachs, Yupeng Zhuo, Eddie Zhang, Xiangchen Yu, Aditya Pachpande, Andrew Kirkpatrick, Jessica Mckee	Artificial Intelligence	poster	free
73	143491	Automatic Funny Scene Extraction from Long-form Cinematic Videos	Automatically extracting engaging and high-quality humorous scenes from cinematic titles is pivotal for creating captivating video previews and snackable content, boosting user engagement on streaming platforms. Long-form cinematic titles, with their extended duration and complex narratives, challenge scene localization, while humor’s reliance on diverse modalities and its nuanced style add further complexity. This paper introduces an end-to-end system for automatically identifying and ranking humorous scenes from long-form cinematic titles, featuring shot detection, multimodal scene localization, and humor tagging optimized for cinematic content. Key innovations include a novel scene segmentation approach combining visual and textual cues, improved shot representations via guided triplet mining, and a multimodal humor tagging framework leveraging both audio and text modalities. Our system achieves an 18.3% AP improvement over state-of-the-art scene detection on the OVSD dataset and an F1 score of 0.834 for detecting humor in long text. Extensive evaluations across five cinematic titles demonstrate 87% of clips extracted by our pipeline are intended to be funny, while 98% of scenes are accurately localized. With successful generalization to trailers, these results showcase the pipeline’s potential to enhance content creation workflows, improve user engagement, and streamline snackable content generation for diverse cinematic media formats.	AAAI 2026 Main Conference		sibendu paul, Haotian Jiang, Caren Chen	Artificial Intelligence	poster	free
74	143489	Octopus: Entropy-Controlled Science Fiction Literature Generation with Persistent Memory-Context Binding	Long-form science fiction generation demands rigorous maintenance of narrative coherence across evolving plots, character dynamics, and speculative world-building. We propose Octopus, an entropy-controlled neural framework with persistent memory-context binding that addresses these challenges through two key innovations: 1) dynamic entropy regulation balancing creativity and structural stability via narrative divergence thresholds, and 2) hierarchical memory architecture preserving character states, plot events, and scientific rules over 10K+ token spans. Evaluations across 12 sci-fi subgenres demonstrate Octopus's superiority over GPT-4 and ReAlign baselines, achieving 15.2\% higher coherence scores (SciClarity) and 62\% fewer contextual contradictions in extended narratives. Human evaluations confirm its effectiveness in maintaining speculative logic (4.7/5 vs. 3.1/5 baseline) while preserving creative diversity. The framework resolves the "hard sci-fi paradox" of enforcing scientific rigor without compromising narrative flexibility, establishing new capabilities for AI-assisted cross-media universe development.	AAAI 2026 Main Conference		Luqi Gong, Xu Wang, Jiaju Kang, Puyu Han, Zeyu Ai	Artificial Intelligence	technical paper	free
75	143488	PeerCoPilot: A Language Model-Powered Assistant for Behavioral Health Organizations	Behavioral health conditions, which include mental health and substance use disorders, are the leading disease burden in the United States. Peer-run behavioral health organizations (PROs) critically assist individuals facing these conditions by combining mental health services with assistance for needs such as income, employment, and housing. However, limited funds and staffing make it difficult for PROs to address all service user needs. To assist peer providers at PROs with their day-to-day tasks, we introduce PeerCoPilot, a large language model (LLM)-powered assistant that helps peer providers create wellness plans, construct step-by-step goals, and locate organizational resources to support these goals. PeerCoPilot ensures information reliability through a retrieval-augmented generation pipeline backed by a large database of over 1,300 vetted resources. We conducted human evaluations with 15 peer providers and 6 service users and found that over 90\% of users supported using PeerCoPilot. Moreover, we demonstrate that PeerCoPilot provides more reliable and specific information than a baseline LLM. PeerCoPilot is now used by a group of 5-10 peer providers at CSPNJ, a large behavioral health organization serving over 10,000 service users, and we are actively expanding PeerCoPilot's use.	AAAI 2026 Main Conference		Fei Fang, Nev Jones, Hong Shen, Naveen Raman, Gao Mo, Megan Chai, Cindy Peng, Shannon Pagdon, Margaret Swarbrick	Artificial Intelligence	technical paper	free
76	143487	PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation	Evaluating the quality of e-commerce search systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) that combines minimal human annotations with LLM judgments to produce reliable estimates of metrics which require sub-instance annotations. Our method requires as few as 100 human-annotated queries and 10,000 unlabeled examples, reducing annotation requirements significantly compared to traditional approaches. We formulate our proposed framework (PRECISE) for inference of relevance uplift for an LLM-based query reformulation application, extending PPI to sub-instance annotations at the query-document level. By reformulating the metric-integration space, we reduced the computational complexity from O(2^\|C\|) to O(2^K), where \|C\| represents corpus size (in order of millions). Detailed experiments across prominent retrieval datasets demonstrate that our method reduces the variance of estimates for the business-critical Precision@K metric, while effectively correcting for LLM bias in low-resource settings.	AAAI 2026 Main Conference		Anirban Majumder, Abhishek Divekar	Artificial Intelligence	technical paper	free
77	143486	Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production	Accurate multi-turn intent classification is essential for advancing conversational AI systems. Yet, it remains challenging due to the scarcity of comprehensive datasets and the complexity of contextual dependencies across dialogue turns hinder progress. This paper presents two novel approaches leveraging Large Language Models (LLMs) to enhance scalability and reduce latency in production dialogue systems. First, we introduce Symbol Tuning, which simplifies intent labels to reduce task complexity and improve performance in multi-turn dialogues. Second, we propose C-LARA (Consistency-aware, Linguistics Adaptive Retrieval Augmentation), a framework that employs LLMs for data augmentation and pseudo-labeling to generate synthetic multi-turn dialogues. These enriched datasets are used to fine-tune a small, efficient model suitable for deployment. Experiments conducted on multilingual dialogue datasets demonstrate significant improvements in classification accuracy and resource efficiency. Our methods enhance multi-turn intent classification accuracy by 5.09%, reduce annotation costs by 40%, and enable scalable deployment in low-resource multilingual industrial systems, highlighting their practicality and impact.	AAAI 2026 Main Conference		Kwan Hui Lim, Bin Fu, Junhua Liu, Tan Keat	Artificial Intelligence	technical paper	free
78	143485	LLM-Based Agent for Competitive Landscape Mapping in Drug Asset Due Diligence	In this paper, we describe and benchmark a competitor-discovery component, an essential part of an agentic AI system for fast drug asset due diligence. A competitor-discovery AI agent, given an indication, retrieves all drugs comprising the competitive landscape of that indication and extracts canonical attributes for these drugs. The competitor definition is investor-specific, and data is paywalled/licensed, fragmented across registries, ontology-mismatched by indication, alias-heavy for drug names, multimodal, and rapidly changing. Although considered the best tool for this problem, the current LLM-based AI systems aren’t capable of reliably retrieving all competing drug names, and there is no accepted public benchmark for this task. To address the lack of evaluation, we use LLM-based agents to transform five years of multi-modal, unstructured due diligence memos from a private biotech VC fund into a structured evaluation corpus mapping indications to competitor drugs with normalized attributes. We also introduce a competitor validating LLM-as-a-judge agent that filters out false positives from the list of predicted competitors to maximize precision and suppress hallucinations. On our benchmark, our competitor-discovery agent achieves 83\% recall, exceeding OpenAI Deep Research (65\%) and Perplexity Labs (60\%). The system is deployed in production with enterprise users; in a case study with a biotech VC investment-fund, analyst turnaround time dropped from 2.5 days to $\sim$3 hours ($\sim$20x) for the competitive analysis.	AAAI 2026 Main Conference		Alisa Vinogradova, Vlad Vinogradov, Dmitrii Radkevich, Katsiaryna Yanchanka, Ilya Yasny, Dmitry Kobyzev, Andrey Doronichev, Ivan Izmailov	Artificial Intelligence	technical paper	free
79	143483	A Deployed Investigative AI Search Engine for Combating Human Trafficking at Web Scale	Human trafficking, affecting over 50 million people globally, is a complex criminal enterprise in which traffickers actively conceal and distribute information across fragmented and often illicit online platforms. Traditional investigative tools are ill-suited for detecting patterns across such obfuscated, heterogeneous data. This paper presents Domain-specific Insight Graphs (DIG), an investigative AI search engine designed to operate at web scale and enable non-technical decision-makers, such as law enforcement and prosecutors, to rapidly uncover actionable leads in human trafficking investigations. DIG employs a novel AI pipeline that ingests large, diverse web corpora (including trafficking-relevant advertisements), cleans and normalizes extracted information, and links entities into a semantic knowledge graph. A domain-optimized search layer allows investigators to traverse these graphs to identify potential victims, perpetrators, and trafficking networks. Unlike commercial alternatives, DIG was released free of charge, open-sourced, and deployed to over 200 U.S. state and local law enforcement agencies through the DARPA Memex program. Deployment results demonstrate measurable impact: in New York, agencies using DIG reported a drop in sex worker arrests and an increase in trafficking-related arrests from <1% to over 60%, disrupting cycles of victim re-victimization. The system has been credited in high-profile prosecutions and received endorsements from District Attorneys. This paper details the problem context, AI approach, deployment process, operational challenges, and lessons learned from maintaining DIG post-federal funding, including navigating intellectual property for open release and sustaining the system via philanthropic support. DIG exemplifies how AI-driven investigative tools can deliver lasting societal benefit through targeted, innovative application in high-stakes domains.	AAAI 2026 Main Conference		Mayank Kejriwal	Artificial Intelligence	technical paper	free
80	143482	Deployed AI Agents for Industrial Asset Management: CodeReAct Framework for Event Analysis and Work Order Automation	Maintenance of mission-critical industrial assets is frequently hindered by fragmented data, inconsistent record-keeping, and limited access to analytical expertise, resulting in reactive rather than predictive practices. We present \textit{CodeReAct}, an AI-powered agentic framework deployed in large-scale facilities to automate event analysis and work order (WO) management.CodeReAct extends the ReAct paradigm by embedding executable Python code within the Thought--Action--Observation (TAO) loop, enabling natural language interaction, grounding heterogeneous alerts and work orders into structured Business Objects (BOs), and dynamically invoking analytic functions for forecasting, anomaly correlation, and maintenance recommendations. This architecture reduces manual data science intervention, improves adaptability, and supports reuse across asset types. Deployed in a mission-critical data center and productionized in Maximo, CodeReAct manages pumps, chillers, AHUs, compressors, cooling towers, and other mechanical and electrical systems. Evaluation with 36 representative maintenance utterances showed that outer-loop reflection and adaptive temperature improved task completion by up to 20%, while ablation studies confirmed the importance of reasoning in addition to code execution. Business validation revealed seasonal failure patterns, bundling opportunities, and predictive accuracy trends. In production, site engineers reported 25--40% faster diagnostics, fewer unplanned downtime events, and reduced reliance on specialized analysts. Lessons learned highlight the importance of structured BOs for grounding analytics, runtime safeguards to mitigate hallucinations, and adaptive model control for consistent execution. These results demonstrate how deployed agentic AI can deliver measurable business value in predictive and strategic maintenance planning.	AAAI 2026 Main Conference		Nianjun Zhou, Dhaval Patel, Anamitra Bhattacharyya	Artificial Intelligence	poster	free
81	143481	SARA: Leveraging LLM Agents and Jurisprudential Ontologies for Automated Legal Reasoning	Delivering judicial decisions requires interpreting complex legal texts, analyzing evidence, and reasoning over jurisprudence and legal principles. Recent advances in Generative Artificial Intelligence, particularly Large Language Models (LLMs), have shown potential to automate parts of this process, yet practical, measurable benefits in real-world judicial settings remain limited. This paper introduces SARA, an LLM-powered legal reasoning platform deployed in a regional Brazilian court, which demonstrates significant efficiency and quality gains through the integration of LLM agents with a Jurisprudential Knowledge Graph (Jur-KG). SARA automatically extracts and structures key elements from legal documents—including claims, requests, and evidence—and generates reasoning grounded in retrieved jurisprudential precedents. The Jur-KG, modeled through an ontology encompassing concepts such as \textit{LegalRelation}, \textit{LegalGrounds}, and \textit{LegalClaims}, enables semantic matching and retrieval of relevant case law. By representing cases according to the Legal Case Ontology for the Brazilian Judicial System, SARA supports traceable reasoning and addresses competence questions to assess coverage, coherence, and justification of AI-generated outputs. Deployment results indicate measurable improvements in processing time, consistency, and explainability, while ensuring compliance with ethical and legal guidelines established by Brazil’s National Council of Justice. This work demonstrates that combining LLM-based agents with domain-specific knowledge graphs can yield both innovative capabilities and proven impact in judicial decision-making.	AAAI 2026 Main Conference		Vasco Furtado, Joao Neto, Vladia Pinheiro, Francisco Bonfim, Sara SIlva, Alicia Neves, Henrique Santos, Jorge Araujo, Rilder Pires, Ricardo Costa	Artificial Intelligence	technical paper	free
82	143480	Discovery of Feasible 3D Printing Configurations for Metal Alloys via AI-Driven Adaptive Experimental Design	Configuring the parameters of additive manufacturing processes for metal alloys is a challenging problem due to complex relationships between input parameters (e.g., laser power, scan speed, and material feed rate) and quality of printed outputs. The standard trial-and-error approach to find feasible parameter configurations is highly inefficient because validating each input configuration is expensive in terms of resources (physical and human labor) and the configuration space is very large. This paper applies the general principle of AI-driven adaptive experimental design for optimization to the more challenging problem of discovering feasible configurations. The key idea is to build a probabilistic surrogate model from past experiments to intelligently select a small batch of input configurations for validation in each iteration. To demonstrate the effectiveness of this methodology, we deploy it for Directed Energy Deposition (DED) process to print GRCop-42, a high-performance copper–chromium–niobium alloy developed by NASA for extreme-temperature aerospace applications. Within weeks, our approach yielded multiple defect-free outputs across a range of laser powers—dramatically reducing time-to-result and resource expenditure compared to four months of manual experimentation by our collaborators with little to no success. By enabling high-quality GRCop-42 fabrication on readily available infrared laser platforms for the first-time, we democratize access to this critical alloy, paving the way for cost-effective, decentralized production of rocket engine chambers, heat exchangers, and other high-heat-flux components.	AAAI 2026 Main Conference		Aryan Deshwal, Jana Doppa, Azza Fadhel, Nathaniel Zuckschwerdt, Susmita Bose, Amit Bandyopadhyay	Artificial Intelligence	technical paper	free
83	143479	Optimizing Preferential Rate in Retail Lending with Causal Inference and Domain Adaptation	In retail lending, offering preferential interest rates is a core marketing instrument for balancing customer acquisition with portfolio profitability. Accurately predicting the effect of interest-rate discounts for each customer is pivotal for optimizing the discount strategy: offering overly generous discounts erodes margins, while insufficient discounts drive price-sensitive customers to defect. Off‑the‑shelf machine learning uplift models rarely respect the complex operational constraints of financial business, such as tiered rate grids, regulatory guard‑rails, and marketing budget ceilings. We propose an integrated system that fuses causal inference and domain adaptation to produce constraint‑aware, customer‑specific discount recommendations. To further enhance practitioner adoption, a large language model layer translates model outputs into actionable narratives. Developed in Hyundai Capital Services, the system boosted transaction volume by 13\%, demonstrating both technical soundness and material business impact.	AAAI 2026 Main Conference		Wooyoung Kim, Jaehyun Kim, Kee-Eung Kim, Sumin Shin, Jimyung Choi, Yujin Lee, Hyeryeong Oh	Artificial Intelligence	technical paper	free
84	143478	From Natural Language to Executable ETL Flows: The IBM DataStage Assistant	Modern ETL (Extract, Transform, Load) tools offer graphical, no-code interfaces for workflow creation but still require users to manually identify transformation functions and configure their properties, which is time-consuming and demands prior expertise. We present the research and engineering foundations of the IBM DataStage Assistant, a deployed capability that generates complete multi-stage ETL flows directly from natural language (NL) descriptions. Our framework infers transformation functions, their properties, and transformer expressions, enabling novices to discover relevant functions and allowing experts to bypass manual configuration. The proposed framework achieves a prediction accuracy of 96.4% for flow predictions, 87.0% for properties, and 83.6% for transformer expressions. We also show a document exploration module that uses retrieval-augmented generation (RAG) over product documentation to answer tool-specific questions in NL. Implemented in IBM DataStage, this approach supports iterative, in-environment workflow design and reduces context switching. In initial studies, it achieves up to 90% time savings for novices and 50% for experts.	AAAI 2026 Main Conference		Sameep Mehta, Nitin Gupta, Thomas Gschwind, Shramona Chakraborty, Tristan Tyler, Shreya Sisodia, Ben Clermont	Artificial Intelligence	technical paper	free
85	143477	ConstructAI: From Real-Time Safety Insight to Skill Growth in Deployed Construction AI Systems	Ensuring safety in power grid construction remains a critical yet challenging task, as existing monitoring approaches often lack scalability, timeliness, and adaptability to diverse on-site conditions. To address these limitations, we present \textbf{ConstructAI}, a deployed AI-driven safety management system that integrates multi-source image and video acquisition devices with advanced multimodal large model reasoning. The system combines text, image, and video prompts through an efficient workflow powered by LLaMA3 and Meta SAM2 backbones, enhanced with LoRA and adaptor modules for multimodal fusion. Once deployed, ConstructAI continuously processes real-time construction footage to identify violations, assess risk levels, and generate standardized rectification requirements. The deployment has demonstrated measurable benefits across multiple sites, including a $>70\%$ increase in violation rectification rates, reduction of average rectification delays from hours to minutes, and a $45\%$ decline in repeat violations. Beyond technical gains, ConstructAI has delivered significant business impacts, such as reduced safety incidents, improved compliance with national regulations, and higher operational efficiency. By enabling proactive risk management and structured safety feedback loops, our system exemplifies how innovative use of AI can translate into tangible improvements for industrial safety. The lessons learned from deployment highlight the importance of balancing algorithmic advances with practical integration into organizational workflows.	AAAI 2026 Main Conference		Wei Wang, Jiang Zheng, Gaowei Zhang, Tiong Kong, Kai Xing, Huan Li	Artificial Intelligence	technical paper	free
86	143476	Scalable and Efficient Large-Scale Log Analysis with LLMs: An IT Software Support Case Study	IT environments typically have logging mechanisms to monitor system health and detect issues. However, the huge volume of generated logs makes manual inspection impractical, highlighting the importance of automated log analysis in IT Software Support. In this paper, we propose a log analytics tool that leverages Large Language Models (LLMs) for log data processing and issue diagnosis, enabling the generation of automated insights and summaries. We further present a novel approach for efficiently running LLMs on CPUs to process massive log volumes in minimal time without compromising output quality. We share the insights and lessons learned from deployment of the tool - in production since March 2024 - scaled across 70 software products, processing over 2000 tickets for issue diagnosis, achieving a time savings of 300+ man hours and an estimated $15,444 per month in manpower costs compared to the traditional log analysis practices.	AAAI 2026 Main Conference		Debanjana Kar, Prateeti Mohapatra, Harshit Kumar, Seema Nagar, Pranjal Gupta, Karan Bhukar	Artificial Intelligence	technical paper	free
87	143475	Optimizing Product Provenance Verification Using Data Valuation Methods	Determining and verifying product provenance remains a critical challenge in global supply chains, particularly as geopolitical conflicts and shifting borders create new incentives for misrepresentation of commodities, such as hiding the origin of illegally harvested timber or stolen agricultural products. Stable Isotope Ratio Analysis (SIRA), combined with Gaussian process regression-based isoscapes, has emerged as a powerful tool for geographic origin verification. While these models are now actively deployed in operational settings supporting regulators, certification bodies, and companies, they remain constrained by data scarcity and suboptimal dataset selection. In this work, we introduce a novel deployed data valuation framework designed to enhance the selection and utilization of training data for machine learning models applied in SIRA. By quantifying the marginal utility of individual samples using Shapley values, our method guides strategic, cost-effective, and robust sampling campaigns within active monitoring programs. By prioritizing high-informative samples, our approach improves model robustness and predictive accuracy across diverse datasets and geographies. Our framework has been implemented and validated in a live provenance verification system currently used by enforcement agencies, demonstrating tangible, real-world impact. Through extensive experiments and deployment in a live provenance verification system, we show that this system significantly enhances provenance verification, mitigates fraudulent trade practices, and strengthens regulatory enforcement of global supply chains.	AAAI 2026 Main Conference		Chang-Tien Lu, Jakub Truszkowski, Naren Ramakrishnan, Ruoxi Jia, Hoang Just, Raquib Bin Yousuf, Shengzhe Xu, Brian Mayer, Victor Deklerck, John Simeone, Jade Saunders	Artificial Intelligence	technical paper	free
88	143474	Layout-Aware Document Parsing with Visual-Linguistic Fusion: The DATA-LUX with Academic Content Service Provider	Many organizations are increasingly relying on unstructured documents such as PDFs and scanned forms to sup-port downstream large language model (LLM) services, including search, summarization, and recommendation. However, traditional OCR systems struggle with diverse layouts of documents, leading to frequent errors and high costs of labor. So, this study developed DATALUX - a robust document layout system that transforms unstructured documents into structured, machine-readable data suitable for automation. Built on a transformer-based detector, DATA-LUX incorporates several modules for layout refinement, text-visual fusion, and layer-wise optimization to improve coherence and generalization across diverse layouts. Around January 2025, we successfully deployed DATA-LUX into one of the largest academic content service firms (Nurimedia) in South Korea. This firm faced the challenge of extracting metadata and references from thousands of academic papers submitted in various formats. Also, the existing LLM-based tools provided unreliable results. So, they needed to process them manually, creating bottlenecks in both labor and time. However, DATALUX enabled the automatic structuring of over 100,000 research papers a year, improving extraction accuracy to over 97%, reducing costs by more than USD 185K annually, and accelerating processing speed by 8.7 times. These deployment results suggest that DATALUX enables scalable and efficient document automation in complex and high-volume environments successfully. We thus believe that our DATALUX has a significant impact on both academia and industry practices.	AAAI 2026 Main Conference		Jae Hong Park, Min Kim, Yeonkyung Kim, Jae Lee, Ki Kim, Ji Kwak	Artificial Intelligence	technical paper	free
89	143473	Large Scale Retrieval for the LinkedIn Feed Using Causal Language Models	In large-scale recommendation systems like LinkedIn’s, the retrieval stage is critical for narrowing billions of potential candidates to a manageable subset for ranking. LinkedIn's feed now serves suggested content based on the topical interests of members, where 2000 candidates are retrieved from several million candidates with a latency budget of a few milliseconds and inbound QPS of several thousand per second. This paper presents a novel retrieval approach that fine tunes a large causal language model (Meta’s LLaMA 3) as a dual encoder to generate high quality embeddings for both users (members) and content (items), using only textual input. We describe the end to end pipeline, including prompt design for embedding generation, techniques for fine tuning at LinkedIn scale, and infrastructure for low latency, cost effective online serving. We share our findings on how quantizing numerical features in the prompt enables the information getting encoded in the embedding facilitating greater alignment between the retrieval and ranking layer. The system was evaluated using offline metrics and an online A/B test, which showed substantial improvements in member engagement. We observed significant gains among newer members, who often lack strong network connections, indicating that high-quality suggested content aids retention. This work demonstrates how generative language models can be effectively adapted for real time, high throughput retrieval in industrial applications.	AAAI 2026 Main Conference		Hamed Firooz, Hejian Sang, Sudarshan Ramanujam, Siddharth Dangi, Birjodh Tiwana, Saurabh Kataria, Antonio Alonso, David Byrne, Sojeong Ha, Manas Somaiya, Sen Zhou, Zhoutao Pei, Andrei Akterskii, Zhanglong Liu, Samira Sriram, Zihan Xiong, Akhilesh Gupta, Angela Shao, Alex Li, Caitlin Kolb, Thomas Kistler, Zach Moore	Artificial Intelligence	technical paper	free
90	143472	Who Is a Better Matchmaker? Human vs. Algorithmic Judge Assignment in a High-Stakes Startup Competition	There is increasing interest in applying artificial intelligence (AI) to automate and support complex decision-making tasks. However, it remains unclear how algorithms compare to human judgment in contexts requiring semantic understanding and domain expertise. We examine this in the context of the judge assignment problem --- matching submissions to suitably qualified evaluators --- at a prominent U.S. university startup competition. Awarding over $\textdollar$500,000 annually, this is a real-world setting where high-quality judge assignment is critical. We develop and deploy HLSE (Hybrid Lexical–Semantic Similarity Ensemble), an AI-based approach, at the competition and compare algorithmic against human expert assignments by collecting blinded match quality scores from judges for $309$ judge-venture matches. Using a Mann–Whitney U statistic based test, we found no statistically significant difference in assignment quality between the two approaches ($AUC=0.48, p=0.40$). On average, algorithmic matches are rated $3.90$ and manual matches are rated $3.94$ on a $5$-point scale, where $5$ indicates an excellent match. Furthermore, manual assignments that took a full week in past years can be completed in under ten minutes by the algorithm during deployment. These results demonstrate that HLSE achieves human expert level matching quality while offering greater scalability and efficiency, underscoring the potential of AI-driven solutions to robustly support and enhance human decision-making for judge assignment in high-stakes settings.	AAAI 2026 Main Conference		Nihar B. Shah, Yang (Sarina) Xi, Orelia Pi, Miaomiao Zhang, Rebecca Xiong, Jacqueline Lane	Artificial Intelligence	technical paper	free
91	143471	Reducing Alert Fatigue Through AI Ranking: A Deployed Public Health Data Monitoring System	Public health experts need scalable methods to monitor large volumes of health data (e.g., human-reported cases, hospitalizations, deaths). These methods must identify individual data points that may indicate significant events, such as outbreaks, or reveal data quality issues. Identifying, triaging, and analyzing these data points in real-time is critical for preventing downstream errors in forecasting or policy. Traditional alert-based data monitoring systems, used for decades in practice, fail to identify relevant data events for several reasons. For example, these systems may not output real-time results from large data volumes, or they may return tens of thousands of unhelpful alerts. We introduce a human-in-the-loop AI system for public health data monitoring that uses a ranking-based AI anomaly detection method. This system was developed through a multi-year interdisciplinary collaboration with participatory design from researchers, engineers, and public health data experts. From this process, we identified system goals, such as user control and efficiency and designed a system that balances these goals. This system has since been deployed at a national public health organization and analyzes up to 5 million data points daily. A three-month longitudinal deployment evaluation revealed a significant improvement in system goals, including a 54x increase in data reviewer efficiency and increased engagement compared to traditional alert-based methods.	AAAI 2026 Main Conference		Bryan Wilder, Nolan Gormley, Roni Rosenfeld, Catalina Vajiac, Tina Townes, Ananya Joshi, Richa Gadgil	Artificial Intelligence	technical paper	free
92	143469	TTF: A Trapezoidal Temporal Fusion Framework for LTV Forecasting in Douyin	In the user growth scenario, Internet companies invest heavily in paid acquisition channels to acquire new users. But sustainable growth depends on acquired users' generating lifetime value (LTV) exceeding customer acquisition cost (CAC). In order to maximize LTV/CAC ratio, it is crucial to predict channel-level LTV in an early stage for further optimization of budget allocation. The LTV forecasting problem is significantly different from traditional time series forecasting problems, and there are three main challenges. Firstly, it is an unaligned multi-time series forecasting problem that each channel has a number of LTV series of different activation dates. Secondly, to predict in the early stage, it faces the imbalanced short-input long-output (SILO) challenge. Moreover, compared with the commonly used time series datasets, the real LTV series are volatile and non-stationary, with more frequent fluctuations and higher variance. In this work, we propose a novel framework called Trapezoidal Temporal Fusion (TTF) to address the above challenges. We introduce a trapezoidal multi-time series module to deal with data unalignment and SILO challenges, and output accurate predictions with a multi-tower structure called MT-FusionNet. The framework has been deployed to the online system for Douyin. Compared to the previously deployed online model, $MAPE_{p}$ decreased by 4.3\%, and $MAPE_{a}$ decreased by 3.2\%, where $MAPE_{p}$ denotes the point-wise MAPE of the LTV curve and $MAPE_{a}$ denotes the MAPE of the aggregated LTV.	AAAI 2026 Main Conference		Fan Wu, Zhenzhe Zheng, Chaoli Zhang, Yibing Wan, Zhengxiong Guan, Xiaoyang Li, Lai Xu, Beibei Jia	Artificial Intelligence	technical paper	free
93	143468	Save, Revisit, Retain: A Scalable Framework for Enhancing User Retention in Large-Scale Recommender Systems	User retention is a critical objective for online platforms like Pinterest, as it strengthens user loyalty and drives growth through repeated engagement. A key indicator of retention is revisitation, i.e., when users return to view previously saved content, a behavior often sparked by personalized recommendations and user satisfaction. However, modeling and optimizing revisitation poses significant challenges. One core difficulty is accurate attribution: it is often unclear which specific user actions or content exposures trigger a revisit, since many confounding factors (e.g., content quality, user interface, notifications, or even changing user intent) can influence return behavior. Additionally, the scale and timing of revisitations introduce further complexity; users may revisit content days or even weeks after their initial interaction, requiring the system to maintain and associate extensive historical records across millions of users and sessions. These complexities render existing methods insufficient for robustly capturing and optimizing long-term revisitation. To address these gaps, we introduce a novel, lightweight, and interpretable framework for modeling revisitation behavior and optimizing long-term user retention in Pinterest’s search-based recommendation context. By defining a surrogate attribution process that links saves to subsequent revisitations, we reduce noise in the causal relationship between user actions and return visits. Our scalable event aggregation pipeline enables large-scale analysis of user revisitation patterns and enhances the ranking system’s ability to surface items with high retention value. Deployed on Pinterest’s Related Pins surface to serve 500+ million users, the framework led to a significant lift of 0.1% in daily active users (DAU) and 0.08% in weekly active users (WAU) without additional computational costs. Our data analysis reveals novel insights, such as the impact of content topics on revisitation rates; for example, users are more likely to revisit aesthetically pleasing topics.	AAAI 2026 Main Conference		Weijie Jiang, Armando Ordorica, Jaewon Yang, Olafur Gudmundsson, Yucheng Tu, Huizhong Duan	Artificial Intelligence	technical paper	free
94	143467	TRUST: Transaction Risk via Unified Sequence and Topology	Abuse detection in e-commerce platforms is critical for preventing operational losses, particularly for transaction types vulnerable to abuse such as Return-to-Origin (RTO) in Cash-on-Delivery (COD) workflows. Detecting such abuse accurate, real-time decisions to intercept malicious orders before placement, imposing stringent sub-second latency requirements on deployed systems. In this work, we present TRUST, a deployed, production-scale abuse detection system based on a unified architecture of heterogeneous Graph Neural Networks (GNNs) and Transformer-based sequence encoders. This design enables joint reasoning over multi-relational entity interactions and temporal behavioural signals, allowing the model to combine complementary information for effective abuse detection when either modality is sparse or absent. TRUST processes millions of transactions daily with an average inference latency of ~25 ms, achieving a ~9.6% absolute precision improvement over a strong XGBoost baseline in live RTO detection. We report systematic ablation studies across both graph and sequence stages, evaluating GNN variants, sampling strategies, sequence lengths, and positional encoding schemes to guide architectural choices. Deployed end-to-end in a high-throughput environment, TRUST demonstrates that GNN–Transformer cascades can deliver state-of-the-art accuracy, scalability, and operational reliability in real-world abuse detection, offering a reproducible blueprint for similar industry-scale applications.	AAAI 2026 Main Conference		Debdoot Mukherjee, Bhavuk Singhal, Anshu Aditya, Shubham Jain, Debashis Mukherjee, Rithvik Y, Akshat Garg, Karan Tanwar	Artificial Intelligence	poster	free
95	143466	RescueLens: LLM-Powered Triage and Action on Volunteer Feedback for Food Rescue	Food rescue organizations simultaneously tackle food insecurity and waste by working with volunteers to redistribute food from donors who have excess to recipients who need it. Volunteer feedback allows food rescue organizations to identify issues early and ensure volunteer satisfaction. However, food rescue organizations monitor feedback manually, which can be cumbersome and labor-intensive, making it difficult to prioritize which issues are most important. In this work, we investigate how large language models (LLMs) assist food rescue organizers in understanding and taking action based on volunteer experiences. We work with 412 Food Rescue, a large food rescue organization based in Pittsburgh, Pennsylvania, to design RescueLens, an LLM-powered tool that automatically categorizes volunteer feedback, suggests donors and recipients to follow up with, and updates volunteer directions based on feedback. We evaluate the performance of RescueLens on an annotated dataset, and show that it can recover 96% of volunteer issues at 71% precision. Moreover, by ranking donors and recipients according to their rates of volunteer issues, RescueLens allows organizers to focus on 0.5% of donors responsible for more than 30% of volunteer issues. RescueLens is now deployed at 412 Food Rescue and through semi-structured interviews with organizers, we find that RescueLens streamlines the feedback process so organizers better allocate their time.	AAAI 2026 Main Conference		Fei Fang, Zhiyu Chen, Zheyuan Ryan Shi, Naveen Raman, Jingwu Tang, Sean Hudson, Ameesh Kapoor	Artificial Intelligence	technical paper	free
96	143465	EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing	Prompt-based essay writing is an effective and common way to assess students' critical thinking skills. Recent work has evaluated the impressive capabilities of Large Language Models (LLMs) on this task. However, most studies focus primarily on English. Those examining LLMs' performance in Chinese often rely on coarse-grained text quality metrics, overlooking the structural and rhetorical complexities of Chinese essays, particularly across diverse genres. We therefore propose EssayBench, a multi-genre benchmark specifically designed for Chinese essay writing, along with a fine-grained, genre-specific scoring framework that hierarchically aggregates scores to better align with human preferences. The dataset comprises 728 real-world prompts across four major genres (Argumentative, Narrative, Descriptive, and Expository), and includes both Open-Ended and Constrained types. Our evaluation protocol is validated through a comprehensive human agreement study. The results show that our protocol aligns well with human judgments, achieving a highest Spearman's correlation of 0.816 and outperforming coarse-grained evaluation methods by an average of 8.6\%. Finally, we benchmark 15 large LLMs, analyzing their strengths and limitations across genres and instruction types. We believe EssayBench offers a more reliable framework for evaluating Chinese essay generation and provides valuable insights for improving LLMs in this domain.	AAAI 2026 Main Conference		Baojun wang, Fei Mi, Lifeng Shang, Fan Gao, Dongyuan Li, Yasheng Wang, Ding Xia	Artificial Intelligence	poster	free
97	143464	GEM: Generative Entropy-Guided Preference Modeling for Few-Shot Alignment of LLMs	Alignment of large language models (LLMs) with human preferences typically relies on supervised reward models or external judges, which in turn require abundant preference data. We propose a generative preference modeling approach for low-resource and domain-specific scenarios, reframing preference learning as an inverse reinforcement learning problem. Instead of training a discriminative reward model, we train the LLM itself to infer and maximize an implicit reward function underlying high-quality reasoning. Specifically, we leverage Chain-of-Thought (CoT) sampling to generate diverse candidate solutions for each query and refine fine-grained preferences from these without additional human labels. We also introduce an entropy-guided token scoring mechanism to rank and weight the sampled CoTs, boosting the importance of high-confidence answers and strategically high-entropy tokens. Building on this, we train the model with our Self-Evaluated Group Advantage (SEGA) algorithm. Compared with other methods, this algorithm efficiently utilizes the fine-grained preference information in group candidate solutions to update the strategy. Our method eliminates dependence on external judges or reward classifiers, instead relying on the generative model’s own judgments. Experiments on general benchmarks and domain-specific tasks—such as mathematical reasoning and medical question answering—demonstrate that our generative preference model achieves significant improvements with limited data.	AAAI 2026 Main Conference		Xuejiao Zhao, Yiyang Zhao, Bai Huiyu	Artificial Intelligence	technical paper	free
98	143463	Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models	Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation, which poses increasing risks in multi-turn or agentic applications where outputs may be reused as context. In this work, we investigate how in-context information influences model behavior and whether LLMs can identify their unreliable responses. We propose a reliability estimation that leverages token-level uncertainty to guide the aggregation of internal model representations. Specifically, we compute aleatoric and epistemic uncertainty from output logits to identify salient tokens and aggregate their hidden states into compact representations for response-level reliability prediction. Through controlled experiments on open QA benchmarks, we find that correct in-context information improves both answer accuracy and model confidence, while misleading context often induces confidently incorrect responses, revealing a misalignment between uncertainty and correctness. Our probing-based method captures these shifts in model behavior and improves the detection of unreliable outputs across multiple open-source LLMs. These results underscore the limitations of direct uncertainty signals and highlight the potential of uncertainty-guided probing for reliability-aware generation.	AAAI 2026 Main Conference		Sanjay Chawla, Tianyi Zhou, Johanne Medina	Artificial Intelligence	poster	free
99	143462	TWINFUZZ: Dual-Model Fuzzing for Robustness Generalization in Deep Learning	Deep learning (DL) models are increasingly deployed in safety-critical applications such as face recognition, autonomous driving, and medical diagnosis. Despite their impressive accuracy, they remain vulnerable to adversarial examples - subtle perturbations that can cause incorrect predictions, i.e., the robustness issues. While adversarial training improves robustness against known attacks, it often fails to generalize to unseen or stronger threats, revealing a critical gap in robustness generalization. In this work, we propose a dual-model fuzzing framework to enhance generalized robustness in DL models. Central to our method is a lightweight metric, the Lagrangian Information Bottleneck (LIB), which guides entropy-based mutation toward semantically meaningful and high-risk regions of the input space. The executor uses a resistant model and a more error-prone vulnerable model; their prediction consistency forms the basis of agreement mining, a label-free oracle for isolating decision-boundary samples. To ensure fuzzing effectiveness, we further introduce a task-driven seed selection strategy (e.g., SSIM for vision) that filters out low-quality inputs. We implement a prototype, TWINFUZZ, and evaluate it on six benchmark datasets and nine DL models. Compared with state-of-the-art testing approaches, TWINFUZZ achieves superior improvements in both training-specific and generalized robustness.	AAAI 2026 Main Conference		Xi Xiao, Xiaogang Zhu, Shaohua Wang, Kun Hu, Wentao Mo, Enze Dai, Sheng Wen, Yang Xiang	Artificial Intelligence	poster	free
100	143461	ACID Test: A Benchmark for Cultural Safety and Alignment in LALMs	Large Audio Language Models (LALMs) are transforming AI by directly processing and generating human language from audio. As these models proliferate in real-world applications, evaluating their performance for equitable and safe use across diverse linguistic and cultural contexts becomes paramount. This paper presents the first comprehensive study on cultural preferences and biases in LALMs across multilingual and multicultural settings. We extend existing cultural harm frameworks from text-based models to the audio domain, analysing how linguistic and cultural diversity influence LALM behaviour, sensitivity, and output quality. Our research uncovers unique challenges in interpreting cultural nuances from audio and linguistic variations. We introduce a novel multilingual audio-text dataset (10 languages, including English), \textbf{Audio Cultural Intelligence Dataset (ACID Benchmark) spanning 1315 hours in audio length}, specifically for evaluating LALM cultural biases, marking the first such examination in this emerging area. Our \textbf{comprehensive analysis includes 10 open-source and 2 closed-source models}, demonstrating significant performance disparities across languages and cultural contexts, highlighting the audio modality's influence on bias manifestation. These findings highlight the critical need to evaluate LALMs not only for technical accuracy but also for fair and culturally sensitive performance, urging the development of inclusive datasets and cultural awareness for building safer and more equitable large audio language models. The ACID benchmark will be made publicly available.	AAAI 2026 Main Conference		Richa Singh, Mayank Vatsa, Bikash Dutta, Adit Jain, Rishabh Ranjan	Artificial Intelligence	poster	free