1 of 9

PTYCHOGRAPHY DATA PIPELINES WITH PTYCHODUS

PTYCHODUS - DIASPORA DEMO + LCLS AI/ML DOCUMENTATION CHALLENGE SERIES

STEVE HENKE

Data Engineer

X-ray Science Division

shenke@anl.gov

ALBERT VONG

Postdoctoral Appointee

X-ray Science Division

avong@anl.gov

8 August 2024

2 of 9

Introduction

  • Ptychography instruments rapidly produce a large volume of data (~10 TB/day)
    • Dataset includes metadata, scan positions, and diffraction patterns
    • Must maintain association between scan positions and diffraction patterns
    • Almost every beamline writes data differently using different conventions
  • Significant computing is needed to obtain the reconstructed sample and probe from the recorded experiment data
    • Supercomputer is very helpful to expedite data processing
  • Ptychography data pipelines are important for providing feedback on experiment effectiveness with short turn-around times (i.e., a few minutes)
    • Fast feedback maximizes utility of collected data and beam time

3 of 9

APS Ptychography Beamlines

  • Bionanoprobe/Bionanoprobe-II
  • Coherent Surface-Scattering Imaging (CSSI)
  • CNM/APS Hard X-ray Nanoprobe (HXN)
  • In Situ Nanoprobe (ISN)
  • Lamino-ptychographY Nanoscale X-ray imaging (LYNX)
  • Polarization Modulation Spectroscopy (Polar)
  • PtychoProbe
  • Velociprobe
  • …and more!

Approximately 10 Beamlines Post-Upgrade

3

APS-U enables faster data acquisition, higher resolution, thicker samples, and more options for multi-modal characterization

4 of 9

APS Ptychography Software

https://github.com/AdvancedPhotonSource

  • Application for ptychography data analysis workflows
    • GUI for data viewing and interactive reconstructions
    • Batch mode for use in workflows or HPC environments
  • Works directly with APS, LCLS, and ALS beamline data formats
  • Calls Tike or PtychoNN for reconstruction tasks

  • Python library of ptychography reconstruction algorithms
  • Built with CuPy to run on NVidia CUDA hardware
  • No from-disk operations; only accepts data already in host-memory
  • May use multiple GPUs on a single host

Ptychodus

Tike

  • Encoder-decoder network that simultaneously predicts sample amplitude and phase from input diffraction data alone.
  • 100s of times faster than iterative phase retrieval and can work with as little as 25X less data.

PtychoNN

5 of 9

PTYCHODUS DEMONSTRATION

6 of 9

APS Ptychography Data Pipelines

  • On-demand workflows enable quick-look feedback during data acquisition
    • File-based workflow
      • Bluesky orchestrates data acquisition then triggers APS Data Management system (DM) to start workflow.
    • Streaming workflow (prototype)
      • Bluesky triggers DM to start processing then orchestrates data acquisition. Scan positions and diffraction patterns communicated via EPICS PVA.
  • Reprocessing workflows are used to obtain the best reconstruction quality and enable detailed analysis. Reprocessing workflows are inherently file-based.
    • Launch reconstructions on Polaris (ALCF) directly from Ptychodus GUI
    • Future: Data viewing and reprocessing via web portals
  • Processing at ALCF via Globus Compute or on local beamline workstation

Ptychographic reconstruction

7 of 9

APS Ptychography Data Pipelines

On-demand file-based reconstruction

APS DM

APS

ALCF

Orchestrates data acquisition and calls DM Python API to start processing

Trigger from Bluesky launches DM workflow:

  1. Call Ptychodus on local resources to prepare beamline data for reconstruction
  2. Use Globus compute to transfer prepared data to ALCF, reconstruct on Polaris, and transfer reconstruction results back to APS

User views experiment data and initial reconstruction in Ptychodus GUI

Prepared experiment data copied to Eagle filesystem

Globus compute endpoint submits job to Polaris demand queue using ALCF service account

Ptychodus loads prepared data & calls Tike to reconstruct the dataset

8 of 9

APS Ptychography Data Pipelines

On-demand streaming reconstruction (proof of concept)

APS DM

Calls DM Python API to start processing then orchestrates data acquisition

  • Detector frames are streamed via PVA with trigger count as metadata
  • Scan positions are streamed (chunked) via PVA with the corresponding detector trigger count

Trigger from Bluesky launches DM workflow that starts the Ptychodus streaming data processor

Ptychodus streaming data processor (pvaPy) receives diffraction patterns and scan positions over separate PVA channels

  • Trigger counts used for robust data association
  • Call PtychoNN for fast inference
  • Results are output to another PVA channel

In development: User views experiment data and reconstruction (both from PVA) in Ptychodus GUI

9 of 9

Ptychodus Workflow Demonstration

Demo: Reprocessing workflow can reconstruct on local workstation or use�GUI to launch reconstruction on remote resources using Globus Compute