1 of 19

Advanced Slurm

Paul Hall, PhD

Senior Research Software Engineer

HPC - team

Center for Computation and Visualization

2 of 19

Goals

  • How to submit jobs on Oscar
  • Array Jobs
  • Dependent jobs
  • Look at resource utilization

3 of 19

Oscar: Under the Hood

Gateway nodes

login

desktop

transfer

/home

50 GB

/data

512+ GB

/scratch

up to 12 TB

GPFS

Storage

Compute

Compute nodes

CPU

CPU

CPU

CPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

GPU

VNC

Scheduler

(Slurm)

4 of 19

Submitting jobs

  • Questions SLURM needs to address before it can schedule your job*
    • How many nodes do you need
    • How many cores do you need
    • How much time will your job run for
    • Where to put your output and error logs

You can specify these outputs using a batch file

5 of 19

Anatomy of batch file

#!/bin/bash

# Here is a comment

#SBATCH --time=1:00:00

#SBATCH -N 1

#SBATCH -c 1

#SBATCH -J MyJob

#SBATCH -o MyJob-%j.out

#SBATCH -e MyJob-%j.err

module load workshop

hello

This is a bash file but with slurm flags

Use #SBATCH to specify slurm flags

6 of 19

Anatomy of batch file

#!/bin/bash

# Here is a comment

#SBATCH --time=1:00:00

#SBATCH -N 1

#SBATCH -c 1

#SBATCH -J MyJob

#SBATCH -o MyJob-%j.out

#SBATCH -e MyJob-%j.err

module load workshop

hello

This is a bash file but with slurm flags

How much time do I need

How many nodes do I need - Use 1 here if your code is not MPI enabled

cores

Name of your job

Where to put your output and error files

%j expands into the job-number

Job number is unique

Bash commands to run your job

7 of 19

Submitting batch files

sbatch <file_name>

sbatch <flags> <file_name>

sbatch -N 2 submit.sh - This will override the corresponding flags in your batch script

8 of 19

Checking on your jobs

  • myq - specific to Oscar
  • squeue -u <username> - works on any machine with SLURM scheduler

9 of 19

Submitting array jobs

#SBATCH --array=i-j - i, j specify a range of values

#SBATCH --array=id_1, id_2, id_3 ... - specify the index for your job

Environment variable SLURM_ARRAY_TASK_ID is created for you. Use this to distinguish your job

10 of 19

Examples

11 of 19

Examples

Problem 1: Print the array task ID for your job

12 of 19

Examples

Problem 2: Use SLURM_ARRAY_TASK_ID to use a different variable from a list

13 of 19

Examples

Problem 3: Submit jobs to work with different files

The file names are included in list_of_files

14 of 19

Dependent Jobs

  • Submit jobs that are dependent on other jobs

sbatch --dependency=<dependency type>:<job_id> <batch_script>

Different types of dependencies

15 of 19

Finding out job ID using bash

  • You can use cut -f 4 -d' ' to find out job number
  • Check on your jobs

squeue -u $USER -o "%.8A %.4C %.10m %.20E"

16 of 19

Examples

17 of 19

Start a job at scheduled time

sbatch --begin=<time> <batch-script>

Options for <time>

  • HH:MM:SS
  • midnight, noon
  • now+60s , now+1h
  • MM/DD/YY YYYY-MM-DD

18 of 19

Checking up on resource utilization

  • Myjobinfo provides a good way of checking maximum resource utilization post runtime
  • seff jobid
  • To check on a job while the job is running you can
    • ssh <nodename>
    • Top
    • Nvtop on GPUs - need module load nvtop

19 of 19

Have Questions?