1 of 30

Ian Foster

Argonne National Laboratory

The University of Chicago

foster@anl.gov

From Data to Discovery: Advancing AI at Scale with Cross-Facility Collaboration

2 of 30

From Data to Discovery: Advancing AI at Scale with Cross-Facility Collaboration

Or: An Agentic Science � Cloud for AI-enabled � Discovery

Ian Foster

Argonne National Laboratory

The University of Chicago

foster@anl.gov

3 of 30

The scientific method has transformed society

Scientific method

4 of 30

But we are falling behind

Data rates

Total data

Computing

Publications

Complexity

Researchers

Funding

Fraction exploited without innovation

Log

value

Time

Data volumes, problem complexity

*

5 of 30

Despite acceleration via automation

Scientific method

Managed transfer & sync

Reliable automation

Managed remote execution

Unified data access

Publication & discovery

6 of 30

Automation of “easy” tasks surfaces new bottlenecks

Synthesize knowledge and propose hypotheses

Write, debug, and run programs

Configure and run experiments

Interpret results to inform new hypotheses

Tasks performed by humans that emerge as bottlenecks

We need to delegate

to AI-enabled agents that act on our behalf

Agents that:

  • Are persistent, stateful, cooperative
  • Are empowered to operate with intermittent human oversight
  • Will increasingly dominate resource usage

7 of 30

We imagine a future with many agent assistants

8 of 30

“Agents for science” are increasingly popular

9 of 30

A computational system that can interact with its �environment and learn from those interactions

Search database, invoke code, query LLM, …

Data repositories, HPC, robotic labs, other agents

Accumulate data, adapt processes, improve answers

How do we build and deploy these things?

But what is an “agent”?

10 of 30

Agentic applications require agentic middleware

Agentic

middleware

Agentic applications

Experimental facilities

Data storage

Compute

An “integrated research infrastructure” for agentic applications

  • Support creation and management of agents
  • Enable use by agents

11 of 30

Agentic middleware challenges in science

  • Access & privileges
  • Agent discovery
  • Asynchronous communication
  • Fault tolerance
  • Interfaces
  • Mobility
  • Persistent stateful execution
  • Provenance

Not addressed by LangChain, AutoGen, OpenAI Agents, Claude Agents, etc.!

12 of 30

Agentic middleware challenges in scientific computing

  • Access & privileges
  • Agent discovery
  • Asynchronous communication
  • Fault tolerance
  • Interfaces
  • Mobility
  • Persistent stateful execution
  • Provenance

Areas we focus on initially …

Under review in IEEE Computer

Not addressed by LangChain, AutoGen, OpenAI Agents, Claude Agents, etc.!

Dr. Greg Pauloski

Dr. Kyle Chard

13 of 30

Globus is a not-for-profit service operated for the research community by the University of Chicago, supported by ~250 subscribing institutions

We assume the Globus hybrid cloud fabric that allows us to authenticate, delegate, start & control programs, manage multi-step flows … anywhere

14 of 30

Exploring agentic middleware: Academy

Client

Handle

Handle

Agent

Control

Actions

State

Handles

Exchange (Data Plane)

Mailbox

Mailbox

Mailbox

Launcher(s) (Control Plane)

Control

Actions

Agent

State

Handles

Dr. Greg Pauloski

Dr. Kyle Chard

15 of 30

Academy middleware prototype: Agent definition

import time, threading�from academy.behavior import Behavior, action, loop��class Example(Behavior):� def __init__(self) -> None:� self.count = 0 # State stored as attributes�� @actiondef square(self, value: float) -> float:� return value**2�� @loopdef count(self, shutdown: threading.Event) -> None:� while not shutdown.is_set():� self.count += 1� time.sleep(1)

Agents defined by a behavior

(e.g., service, embodied, AI)

Clients & other agents can � request actions

Instance of a behavior is state

Control loops for autonomous behavior

16 of 30

Academy middleware prototype: Client usage

from academy.exchange.thread import ThreadExchange�from academy.launcher.thread import ThreadLauncher�from academy.manager import Manager��with Manager(� exchange=ThreadExchange(), # Can be swapped� launcher=ThreadLauncher(),�) as manager:� behavior = Example() # From the prior slide� handle = manager.launch(behavior)� � future = handle.square(2)� assert future.result() == 4�� handle.shutdown() # Or via the manager� manager.shutdown(handle.agent_id, blocking=True)

Single interface for managing your agents

Choose exchange & launcher for environment

Interact with agents via handles

Pass handles to other agents

17 of 30

Academy use case: MOF discovery

Metal Organic Frameworks (MOFs):

  • Composed of organic molecules (ligands) and inorganic metals (nodes)
  • The sponges of materials science!
  • Porous structures that adsorb and store gases
  • Topologies can be optimized for targeted gas storage → Carbon Capture, Methane Storage

17

Federated Agents |

Intractable search space of ligand, node, & geometry combinations

How to discover MOFs with desirable properties for target applications?

18 of 30

Hypothesize

Publish

Experiment

Study

Set Goals

Simulate

Humans set research goals

Humans research related work

Humans create hypotheses to test

Develop

Humans write code and protocols

Humans run codes, process results

Humans synthesize, test MOFs in lab

Humans publish results

MOF Discovery Cycle

Human-Driven

MOF discovery pipeline

19 of 30

Hypothesize

Publish

Experiment

Study

Set Goals

Simulate

Humans set research goals

Humans research related work

Humans create hypotheses to test

Develop

Humans write code and protocols

Agents run codes, process results

Humans synthesize, test MOFs in lab

Humans publish results

MOF Discovery Cycle

Generate

Assemble

Validate

Optimize

Estimate

AI generated ligands

Assembled candidate MOFs

Structurally stable MOFs

Goal-optimized MOFs

Assessed MOFs

Database

Periodic model retraining

MOFA Workflow

Human-Driven

Automated

MOF discovery accelerated by agentic computation

20 of 30

MOFA online learning + GenAI + simulation code

Federated Agents |

AI Agent

Knowledge Agent

Computational Agents

We agentify the code via Academy

21 of 30

Agentified MOFA code easily maps to many resources

Training

Dataset

Generator

Assembler

Estimator

Database

Validator

Optimizer

Chameleon

Cloud

CPUs

Storage

CPUs

Ligands

MOF

Candidates

Stable

MOFs

Optimized

MOFs

CO2

Capacities

Lattice

Strain

Legend

Agent

Resource

Data Flow

Agents executed remotely via Globus Compute

Data moved via Globus transfer

Authentication and authorization via Globus Auth

22 of 30

Benefits of agentic model:

  • Placement: Move agents to resources
  • Separation of concerns: Resource acquisition & scaling based on local workload
  • Loose coupling: Swap agents, integrate new agents (e.g., SDL)
  • Shared agents: Multiple workflows can share agents (microservice-like)

First batches of ligands

MOF buffer fills and Assembler scales down

Validator scales out to start processing MOFs

Optimizer scales out after first validated MOFs

Estimate CO2 of optimized MOFs

Assembler and Estimator auto-scale

Batch job walltime expires

Agentified MOFA application execution trace

23 of 30

Hypothesize

Publish

Experiment

Study

Set Goals

Simulate

Humans set research goals

Agents research related work

Agents create hypotheses to test

Develop

Agents write code and protocols

Agents run codes, process results

Agents synthesize, test MOFs in lab

Humans publish results

MOF Discovery Cycle

Human-Driven

Automated

Further automation via additional agents

Lab agent

Query PubMed for ChatGPT feedstock

AI agent

Priyanka Setty

Arvind Ramanathan

Rory Butler

24 of 30

Ongoing R&D in support of agentic IRI and applications

  • Current activities
    • Academy agent middleware prototype
    • Globus Flows for automation of sets of flow actions
    • Streaming support within Globus Transfer service
    • Globus Search for organizing information for use by agents
    • Diaspora project developing resilient agents and applications

  • Open challenges (many!)
    • Uniform token-based auth mechanisms: e.g., Globus Auth
    • Policies for agentic access to distributed resources
    • Agents as shared services: defining, provisioning, evolving

25 of 30

Ongoing R&D in support of agentic IRI and applications

  • Current activities
    • Academy agent middleware prototype
    • Globus Flows for automation of sets of flow actions
    • Streaming support within Globus Transfer service
    • Globus Search for organizing information for use by agents
    • Diaspora project developing resilient agents and applications

  • Open challenges (many!)
    • Uniform token-based auth mechanisms: e.g., Globus Auth
    • Policies for agentic access to distributed resources
    • Agents as shared services: defining, provisioning, evolving

26 of 30

Streaming support in Globus Transfer service

  • Goal: Managed, secure, high-speed memory-to-memory transfers over *AN
  • Method: New mem-to-mem transfer connector for Globus Connect Server
    • Build on SciStream system (Raj Kettimuthu, Flavio Castro, et al.)

Node

Node

A

B

Transfer from �file system A to file system B

Flavio Castro

Raj Kettimuthu

Talks Tuesday, Wednesday

27 of 30

Streaming support in Globus Transfer service

  • Goal: Managed, secure, high-speed memory-to-memory transfers over *AN
  • Method: New mem-to-mem transfer connector for Globus Connect Server
    • Build on SciStream system (Raj Kettimuthu, Flavio Castro, et al.)
  • Leverage from Globus:
    • User authentication
    • Secure data channel establishment
    • Globus Connect Server deployment model
  • Provides:
    • REST API for stream management
    • Secure data and control channels
  • Release planned for later in 2025

  • In progress:
    • Bandwidth reservation via SENSE (https://sense.es.net)

Node

Node

Process C

Process D

Transfer from �process C to �process D

28 of 30

Thank you!

Rachana Ananthakrishnan, Ben Blaiszik, Flavio Castro, Kyle Chard, Ryan Chard, Nathaniel Hudson, Eliu Huerta, Raj Kettimuthu, Greg Pauloski, Arvind Ramanathan, and many others

Collaborators at other DOE labs: LBNL, ORNL, SLAC, etc.

Funding:

  • DOE: Diaspora project (Distributed Resilient Systems program), Argonne Leadership Computing Facility
  • NSF: Globus Automate, SciStream

foster@anl.gov

29 of 30

Summary: Applications are becoming agentic

  • We will increasingly delegate complex research tasks to persistent, stateful, cooperative AI-enabled agents that will dominate use of scientific resources

🡺 Requiring an “agentic science cloud”

  • Academy prototype illustrates how agentic middleware �can simplify development, deployment, and use�of science agents
  • Many challenges to overcome, e.g.:
    • Developing, evaluating, applying agents
    • Global auth fabric to match global network�and action fabric (need 21st C mechanisms!)
    • Provisioning, managing persistent agents

Comments, questions: foster@anl.gov

Robot Sisyphus by Amy Kurzweil

30 of 30

Robotic physical labs

Robotic virtual labs

Many trillions of tokens of structured and unstructured scientific data

1000s of robots generate data, test hypotheses

Exascale systems

train models, generate data, �test hypotheses

Embodied learning agents with deep expertise in science principles and practice

Scientific data

Universal data, compute, trust fabric

Scientific agents