1 of 4

Agentic AI for Simulations Workflows�R. E. Firth², P. Edwards², S. Tanaka¹, P. Chachara², M. Esposito¹, A. Suryakumaran¹, A. Shkurti²,V. Elisseev¹�¹IBM Research�²The Hartree Centre, STFC

20^th Workshop on Workflows in Support of Large-Scale Science

NOV 17, 2025, ST. LOUIS, MO, USA

This work is supported by the Hartree National Centre for Digital Innovation (HNCDI).

2 of 4

Multiscale Simulations Workflow

Laser

Intensity

Predetermined Data:

Laser Profile
Capsule Design

Converter I

Converter II

Approximator

Freyja

EPOCH

Post-Processor

BOA

Hot Electron Density
Hot Electron Temperature

Density
Temperature

Beam Density
Beam Drift Energy

Coupling Efficiency

Visualization Client

Converter III

Hybrid Cloud

Visualization Server

Combination of radiation hydrodynamic solver Freyja with time scales over tens of nanoseconds and space scale of millimetres and

PIC solver EPOCH over a time scale of a few picoseconds and space scale of several microns.

Workflow Manager

3rd party

Custom

3 of 4

Agentic Network

Context Engineer Agent enriches and summarises the initial prompt using location of the target data, and the source code of the IO routines.
Prompt Writer Agent receives enriched context and formulates a prompt for the Code Writer Agent. Needs the strongest possible LLM.
Code Writer Agent, whose generation adheres to a structured output. Code Writer can be small, specialist code models.
Code Validator Agent receives generated code and parses it using Python's AST. The code is run, and any outputs are stored in a state. If an error is encountered the invalid code and errors are passed back to the Prompt Writer.
Code Judge Agent evaluates the generated code for adherence to the prompt and applies any additional conditions. The Code Judge represents the use of the LLM as a reward model and allows to engineer in instructions towards code that conforms to a particular style or is optimised for performance.

4 of 4

Our agentic network can generate a converter script and output schema for the Freyja solver output data, in just two traversals of the agent graph.
Reliability: LLM selection and problem formulation affect results.
Validation: source code analysis + unit tests + workflow execution.
Security:

Generated code is considered high-risk Software of Unknown Provenance (SOUP) and must run within an isolated environment.

Preliminary Results