Submitting Jobs on H4H🔗
As mentioned in the Introduction to Slurm tutorial, jobs are the primary way a user should interact with the compute cluster.
Submitted jobs enter into a job queue which uses a combination of information about the requested resources, the users group as well as their recent use of the queue to distribute resources fairly between competing jobs. You can submit an arbitrary number of jobs to the queue, but each one will affect your queue priority.
Warning
The Slurm Scheduler works by analyzing the resources requested by a job, the resources available, and the history of your account's jobs to allocate resources fairly.
Abusing the queue by regularly over allocating resources for your jobs or frequently running jobs which fail will affect your overall queue priority and you will have to wait longer for your jobs to run.
For large jobs, it isn't unusual to have to wait a few days for the jobs to run. In general, the smaller the job allocation requested, the better your queue priority will be. Therefore it can be useful to break up large analysis into multiple smaller steps. This has the added benefit of allowing parallelization of jobs for non-dependent tasks.
This tutorial will demonstrate usage of the following commands:
sbatch
squeue
scancel
sacct
seff
Submitting A Job🔗
Job submission is done via the sbatch
command. Detailed information about
the available options for this command can be viewed in the terminal by running
man sbatch
. A short summary of available options can also be printed via
sbatch --help
. The printed version has the added benefit of returning parsable
text, so you can use Shell scripting to do things like search for relevant options.
For example, if I want to see all sbatch
options related to CPU usage for
my job, I can run:
A template job script is available in templates/hello_slurm.sh
. This shows
the basic anatomy of a job on H4H. Please open the file now so you can follow
along with details about each section of the script.
-
The Shebang statement
- The first line of your script should always include a shebang (
#!
) statement for the langauge you are running, usually by specifying the path to the language runtime - For H4H, generally you will be using
bash
to orcestrate execution of additional scripts and functions in other languages, as needed for your analysis
- The first line of your script should always include a shebang (
-
The Slurm header
#!/bin/bash #SBATCH --job-name=hello_slurm #SBATCH --output=hello_slurm.txt #SBATCH --error=hello_slurm-%j.log #SBATCH --time=0:01:00
- While it is possible to pass in
sbatch
options via the command line, this is a tedious and irreproducible way to configure your job - As such, it is preferrable to codify your job configuration in the job script using special comment syntax called a Slurm header
- It should immediately follow the shebang statment (no empty lines) and uses the same conventions as the command line options but prepends them with the
#SBATCH
directive to tell Slurm these should be handled as command options instead of regular comments - The Slurm header is read until the first empty line, as a result adding white space will break your header. You can instead add extra comment lines if you need to break up your header to increase readability
- To view a list of all possible options, run
man sbatch
orsbatch --help
- While it is possible to pass in
-
The bash script
- This where the action happens, you can run arbitrary commands on the cluster here!
- Anything after the Slurm header and a blank line is considered the script to be executed by the job
- Use it to dispatch additional jobs, execute scripts in a variety of languages or script your task directly in the job file
Try it yourself🔗
-
Connect to the UHN VPN if you aren't already
-
SSH into the Login Node as described in Tutorial 1
-
Open a text editor with a file called
hello_slurm.sh
and paste in the contents oftemplates/hello_slurm.sh
-
Submit the job to run using
sbatch
-
View the job in your queue via
squeue
, note the Job ID and Job name -
The job will wait for 60 seconds before printing "Hello Slurm!" to a text file in your home directory. View the contents of the file when it appears then view the log file specified in the Slurm header for additional execution information
-
Have a look at your job history via the
sacct
function, note your job id. -
Use the
seff
script and your job id from the previous step to view the efficiency of your slurm job -
Resubmit the hello_slurm job and this time run
sstat
to view resources in real time. -
Cancel the resubmitted job (quickly it only runs for 60 seconds!)