Mitigate Data Management Challenges for the Exa.TrkX Workflow�With the HEPonHPC Partnership (HEP)
Scientific Achievement
This project develops various I/O techniques to address data management challenges exhibited in the Exa.TrkX workflow, a DOE project studies neutrino particle trajectories reconstruction using modern ML algorithms.
Significance and Impact
This project reduces the data management costs for individual workflow tasks such as graph sample generation and data retrieval in ML training, as well as for workflow tasks sharing the same data sets. The experimental results using the MicroBooNE dataset show up to a 16.4x speedup over the previous approach on Cori at NERSC.
Technical Approach
Event display of an electron neutrino interaction in the MicroBooNE detector
Exa.TrkX workflow contains multiple tasks. Output data of one task becomes inputs to successive tasks.
Strong scaling timing results of the graph construction task. Input data is νμ histogram from MicroBooNE.
C. Lee, V Hewes, G. Cerati, J. Kowalkowski, A. Aurisano, A. Agrawal, A. Choudhary, and W. Liao. “A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows," to appear in the 23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2023.