1 of 1

Optimized Preprocessing For Scientific Deep Learning Applications

Scientific Achievement

Develop optimized preprocessing for scientific deep learning workloads (MLperf HPC). Optimized preprocessing improved the end-to-end performance by up to 10x.

Significance and Impact

Deep-learning workloads are increasingly significant consumer of HPC compute cycles. Developing improved preprocessing pipelines is critical for efficient utilization of systems with AI accelerators. Our optimized preprocessing utilized novel techniques to improve processing data for CosmoFlow and DeepCAM applications, which could be leveraged for other scientific deep-learning applications.

Technical Approach

  • Develop application-specific coding/decoding preprocessing.
    • Specialized data format for both FP16 floating-point and integer-based scientific data.
    • Operator fusion and reordering to improve the preprocessing execution.
    • Decoder logic that efficiently execute on both the host and the accelerator side.
  • Improve the performance through the reduction of data movement across the architectural bottlenecks, and enable better caching.
  • Develop data schema/metadata optimized for efficient processing on GPU accelerators.

K. Z. Ibrahim and L. Oliker, "Preprocessing Pipeline Optimization for Scientific Deep Learning Workloads," 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 2022, pp. 1118-1128, doi: 10.1109/IPDPS53621.2022.00112.

Performance improvement of Cosmoflow deep learning throughput on GPU-accelerated systems at OLCF and NERSC.