1 of 19

Keynotes,

Panels & Demos

AAAI 2024 Symposium on Clinical Foundation Models

25 – 27 March, 2024 at Stanford University

2 of 19

Day 3

3 of 19

Time Series Foundation Models

Keynote 6, 9:00 AM

Mononito Goswami, Ph.D. student in Robotics at Carnegie Mellon University

Abstract: Large language and vision models pre-trained on vast quantities of text and image data have many desirable properties: these models perform well on a variety of tasks on data from diverse domains, with little or no supervision, and can be tuned to perform well on specific tasks. There is a growing interest in unlocking these key capabilities for time-series data through time-series foundation models. I will begin this talk by first breaking down some characteristics of a time-series foundation model. I will then discuss key challenges towards building and pre-training these models, and recent approaches including MOMENT, MOIRAI, Lag-LLaMa, TimesFM, and Time-GPT. I will also review strategies to reprogram large language models for time-series forecasting and other prediction tasks. Finally, I will end the talk by discussing opportunities for future work, in particular, on multimodal time-series and text foundation models, and holistic benchmarking and evaluation.

4 of 19

Fantastic LVLMs and how to ground them​

Keynote 7, 9:30 AM

Dr. Erhan Bas, Head of Foundational AI at GE Healthcare

Abstract: Large Vision Language Models (LVLM) have emerged as powerful models capable of achieving impressive zero-shot generalization on a wide range of downstream tasks. Its full potential is yet to be realized in the healthcare domain to optimize processes and transform workflows across technologists and clinicians. In this talk, I will discuss some of the recent advancements in LVLMs, their effectiveness, generalization capabilities, and shortcomings.

5 of 19

How to increase the adoption of (generative) AI in healthcare?

Xinyu Li*

CMU

Cao (Danica) Xiao

GE Healthcare

Panel 3, 11:00 AM

Erhan Bas

GE Healthcare

6 of 19

Day 2

7 of 19

Advancing Clinical Trial Development

with Generative AI

Keynote 4, 9:30 AM

Prof. Jimeng Sun, Health Innovation Professor at the University of Illinois Urbana Champaign

Abstract: Recent advancements in Generative AI have shown great potential in streamlining various aspects of clinical trial development. In this talk, we present three works that showcase how Generative AI can help clinical trial design and operation. First, we introduce TrialGPT, which utilizes Large Language Models (LLMs) to match patients to clinical trials. By analyzing patients' medical notes, TrialGPT accurately predicts their suitability for various trials, thereby accelerating the recruitment process and ensuring better patient-trial fit. Next, we discuss AutoTrial, a tool that simplifies the design of eligibility criteria for clinical trials. AutoTrial employs language models to generate clear and concise criteria, adapts to new information, and provides transparent explanations for its decisions, ultimately reducing the complexity of the trial design process. Finally, we present the Trial Foundation Model, a finetuned LLM called Panorama. Trained on millions of trial documents and publications, Panorama demonstrates superior performance in various trial design tasks. We showcase its potential to enhance clinical trial design.

8 of 19

Advancing Health at the Speed of AI

Keynote 5, 2:00 PM

Dr. Hoifung Poon, General Manager at Health Futures in Microsoft Research

Abstract: The dream of precision health is to develop a data-driven, continuous learning system where new health information is instantly incorporated to optimize care delivery and accelerate biomedical discovery. In reality, the health ecosystem is plagued by overwhelming unstructured data and unscalable manual processing. Self-supervised AI such as large language models (LLMs) can supercharge structuring of biomedical data and accelerate transformation towards precision health. In this talk, I'll present our research progress on generative AI for precision health, spanning biomedical LLMs, multimodal learning, and causal discovery. This enables us to extract knowledge from tens of millions of publications, structure multimodal real-world data for millions of cancer patients, and apply the extracted knowledge and real-world evidence to advancing precision oncology in deep partnerships with real-world stakeholders.

9 of 19

How to make an impact in the

practice of healthcare?

Hoifung Poon

Microsoft

Panel 2, 11:00 AM

Hanyin Wang

Mayo Clinic

10 of 19

Plenary Volunteer

Interested in presenting a brief 5-min update at the plenary? Let us know…

  • Organizers should not present during in this portion of the event
  • Seek a brief, lively, engaging talk of no more than 5 mins in length.
  • Slides are permitted but should be limited
  • The volunteer speaker is free to collaborate with other attendees to compose this brief talk
  • Talks require little preparation and speaking extemporaneously is encouraged
  • A fun and lighthearted tone is also welcomed

11 of 19

Automated Evaluation of Retrieval-Augmented

Language Models with Task-Specific Exam Generation

Demo 3, 4:00 PM

Gauthier Guinet, Applied Scientist at Amazon

Abstract: We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG). Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions based on the corpus of documents associated with the task. Our method is an automated, scalable, interpretable, and cost-efficient strategy to select the optimal components for a RAG system. We leverage Item Response Theory (IRT) to estimate the quality of an exam and its informativeness on task-specific accuracy. IRT also provides a natural way to iteratively improve the exam by eliminating the exam questions that are not sufficiently informative about a model's ability. We demonstrate our approach on five new open-ended Question-Answering tasks based on Arxiv abstracts, StackExchange questions, AWS DevOps troubleshooting guides, Medical Data and SEC filings. In addition, our experiments reveal more general insights into factors impacting RAG performance like size, retrieval mechanism, prompting and fine-tuning. Most notably, our findings show that choosing the right retrieval algorithms often leads to bigger performance gains than simply using a larger language model.

12 of 19

Using the Gradient AI Agent Stack to Transform Healthcare Operations

Demo 4, 4:30 PM

Dr. Leo Pekelis, Chief Scientist at Gradient AI

Abstract: Companies use Gradient to build powerful medical benefits knowledge bases that save customer service teams up to 70% of their time. In our demo, we dive into how to leverage the Gradient platform (including PDF extraction and RAG), our proprietary medical foundational model (Nightingale), and our optimized medical embeddings model.

13 of 19

Day 1

14 of 19

Data-Efficient Foundation Models,

From Pretraining to Adaptation

Keynote 1, 9:10 AM

Prof. Frederic Sala, Assistant Professor, University of Wisconsin-Madison

Abstract: Powerful and typically massive 'foundation' models offer the promise of serving as a base for diverse applications, including in clinical domains. Unfortunately, it turns out that training and adapting these models for downstream tasks tends to be difficult and expensive, often requiring collecting and annotating substantial quantities of domain-specific data. In this talk, I will describe my group's work on addressing this challenge. First, we will discuss skill-based training, enabling language model pretraining and fine-tuning with substantially smaller corpora. Next, when adapting vision-language models like CLIP to deal with spurious correlations, we show how to self-guide the adaptation process, without any additional data. Next, we show how to integrate relational structures like knowledge graphs into model prediction pipelines, enabling models to adapt to new domains unseen during training, without additional annotated examples. Finally, in the most challenging scenarios, when the model must be fine-tuned on labeled data, we show how to obtain this data efficiently through techniques from weak supervision.

15 of 19

How LLMs might help us scale world class healthcare to everyone?

Keynote 2, 9:30 AM

Research Scientist, Google

Abstract: In recent years, the field of AI has been revolutionized by the emergence of Transformers and Large Language Models. However, perhaps nowhere is their impact likely to be more profound than in the biomedicine where they have the potential to act as care multipliers, help improve our understanding of biology and solve the burden of diseases. In this talk, I will introduce recent works from my team at Google AI, Med-PaLM, Med-PaLM 2, Med-PaLM M and AMIE which I believe are key milestones towards such a future. Med-PaLM and Med-PaLM 2 were the first AI systems to obtain passing and expert level scores on US Medical License exam questions respectively, a long standing grand challenge in AI. Med-PaLM M was the first demonstration of a generalist, multimodal, biomedical AI system. More recently, two recent studies highlight AMIE's promising capabilities. In a double-blind, randomized study, AMIE performed competitively against Primary Care Physicians in text consultations. Additionally, a separate study demonstrated AMIE's significant assistive potential for clinicians facing complex diagnostic challenges. I will outline the motivation, principles and technical innovations underpinning these systems. Finally, I will sketch out a vision for how we might be able to leverage such powerful systems to help scale world class healthcare to everyone and make medicine a humane endeavor again.

16 of 19

Foundation Models for

Clinical Decision Support in Critical Care

Keynote 3, 2:00 PM

Professor of Critical Care, Mathematics, and Chemical Engineering, University of Pittsburgh

Abstract: The modern intensive care unit is the prime example of a complex environment where decisions require rapid integration of large amounts of rapidly changing data originating from bedside monitors, other devices, electronic health records, images, and a growing number of potential other sources. Importantly data is dynamic reflecting changing health conditions. It has proven challenging to develop and deploy decision support systems that demonstrate performance criteria suitable for the ICU environment. Foundation models provide a new opportunity to integrate multi-domain time series for diagnostic, prognostic, and prescriptive tasks. In particular, high-frequency time series data is routinely obtained, is potential less biased than other data sources, and has been distinctly underexploited as a source of information. Traditional modeling approaches are challenged by such data. Pre-trained foundation models will likely play an important role in democratizing decision support in both critical and non-critical environments.

17 of 19

Foundation models for clinical AI–

Challenges and Opportunities

Panel 1, 11:00 AM

18 of 19

TimeGPT: A Foundation Model

for Time-series by Nixtla

Demo 1, 4:00 PM

Max Mergenthaler Canseco, Azul Garza, Cristian Challu, Co-founders at Nixtla

Company Bio: Nixtla is to time-series what Anthropic or Open AI are to language and images. We are the creators of TimeGPT. With our pre-trained model, an enterprise can upload its data and predict. Saving millions of dollars and months of development and maintenance. TimeGPT was trained on over 100 billion rows of financial, weather, energy, and web data – and democratizes the power of time-series analysis. Before TimeGPT Nixtla created the most comprehensive time series ecosystem, with more than 5 million downloads. Nixtla’s software is currently used in production by Fortune 500 companies like Amazon, Walmart, Meta, and Accenture.

19 of 19

Drugs and adverse event extraction using GenAI

Demo 2, 4:30 PM

Dr. Hoang Tran, Research Scientist

Abstract: In the new world of off-the-shelf generative AI models, you can just grab a model pre-trained by OpenAI, Google, Hugging Face, etc., and start generating predictions. And these predictions can be large chunks of generated content! This leaves many data scientists wondering, where does my data actually add value in the development of production AI healthcare applications. In this demo, you’ll see how unique data is critical to developing high-quality generative AI applications and learn where data can be used and how it should be prepared, managed, and applied to deliver real-world value for your organization.