1 of 1

Mitigate Data Management Challenges for the Exa.TrkX WorkflowWith the HEPonHPC Partnership (HEP)

Scientific Achievement

This project develops various I/O techniques to address data management challenges exhibited in the Exa.TrkX workflow, a DOE project studies neutrino particle trajectories reconstruction using modern ML algorithms.

Significance and Impact

This project reduces the data management costs for individual workflow tasks such as graph sample generation and data retrieval in ML training, as well as for workflow tasks sharing the same data sets. The experimental results using the MicroBooNE dataset show up to a 16.4x speedup over the previous approach on Cori at NERSC.

Technical Approach

  • Use HDF5 compound data types to store training samples in a contiguous file space which decreases the number of system read calls and hence results in faster I/O time.
  • For common query patterns, employ auxiliary data objects to speed up the metadata access time.
  • Adopting parallel I/O on shared file for faster data retrieval during the ML model training.

Event display of an electron neutrino interaction in the MicroBooNE detector

Exa.TrkX workflow contains multiple tasks. Output data of one task becomes inputs to successive tasks.

Strong scaling timing results of the graph construction task. Input data is νμ histogram from MicroBooNE.

C. Lee, V Hewes, G. Cerati, J. Kowalkowski, A. Aurisano, A. Agrawal, A. Choudhary, and W. Liao. “A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows," to appear in the 23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2023.