1 of 6

Understand Job Execution in SuperComputers!

Alex Moore

Di Zhang

2 of 6

Motivations

Researching job scheduling in supercomputers is important because…

  • Supercomputers are built to solve complex problems that require high-speed data processing → more efficiency = more complex problems solved
  • Optimal utilization of resources
  • $$$

The bigger picture

  • Simulate spacecraft designs to test and gather data
  • Analyze cyber attacks and natural disasters

3 of 6

Backgrounds

For this project I have learned:

  • How HPC Batch Job Schedulers Runtime are estimated
  • Importance of job scheduling
    • resource utilization
    • efficiency of job execution
  • Methods of automating the scheduling of jobs

Skills and technologies

  • Python
  • Pandas
  • Machine learning
  • Jupyter Notebook

4 of 6

Current Progress

  • I am currently mapping data from past job traces using Jupyter Notebook and Pandas
  • Extract features in order to better understand the data

*Trade-off between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates Table III Yuping Fan, Zhiling Lan, et al

*

5 of 6

Future Plans

I plan on working with my advisor to..

  • practice extracting features from other job traces
  • develop a different method of automating the batch job scheduling process using reinforcement learning with PyTorch.

6 of 6

Thank You

Q & A