1 of 1

Extreme-Scale Ensembles of Iterative Random Forests

1

Scientific Achievement

The Cheetah workflow tool was used to improve feature throughput for ensembles of Iterative Random Forests on Summit, a leadership class supercomputer at OLCF. Researchers compose a portable ensemble specification that can be executed on different systems. Cheetah submits, executes, and tracks the large Campaign for the science team.

Significance and Impact

Using Cheetah, we obtain 8x improvement in feature throughput for the Iterative Random Forest Leave One Out Prediction (iRFLOOP) application used in computational biology. The challenges discussed will help drive research in workflow technologies for extreme-scale task-based computing.

Technical Approach

  • Cheetah provides a Python-based framework to easily compose large campaigns
  • Two campaign designs were developed to explore capacity-class and capability-class ensemble campaigns
  • A large ensemble of 4 million runs was run on the Summit supercomputer at ORNL
  • Challenges discussed will drive research in workflow technologies for extreme scale HPC

8x speedup in feature throughput for iRF-LOOP using Cheetah�A large simulation ensemble is composed using Cheetah’s Python-based composition interface. Cheetah’s execution engine then executes this ensemble by dynamically using HPC computational resources to obtain efficient resource utilization.

Running Ensemble Workflows at Extreme Scale, Kshitij Mehta et. al., eScience 2022, https://doi.org/10.1109/eScience55777.2022.00042