Submitting Jobs on H4H🔗

As mentioned in the Introduction to Slurm tutorial, jobs are the primary way a user should interact with the compute cluster.

Submitted jobs enter into a job queue which uses a combination of information about the requested resources, the users group as well as their recent use of the queue to distribute resources fairly between competing jobs. You can submit an arbitrary number of jobs to the queue, but each one will affect your queue priority.

Warning

The Slurm Scheduler works by analyzing the resources requested by a job, the resources available, and the history of your account's jobs to allocate resources fairly.

Abusing the queue by regularly over allocating resources for your jobs or frequently running jobs which fail will affect your overall queue priority and you will have to wait longer for your jobs to run.

For large jobs, it isn't unusual to have to wait a few days for the jobs to run. In general, the smaller the job allocation requested, the better your queue priority will be. Therefore it can be useful to break up large analysis into multiple smaller steps. This has the added benefit of allowing parallelization of jobs for non-dependent tasks.

This tutorial will demonstrate usage of the following commands:

sbatch
squeue
scancel
sacct
seff

Submitting A Job🔗

Job submission is done via the sbatch command. Detailed information about the available options for this command can be viewed in the terminal by running man sbatch. A short summary of available options can also be printed via sbatch --help. The printed version has the added benefit of returning parsable text, so you can use Shell scripting to do things like search for relevant options.

For example, if I want to see all sbatch options related to CPU usage for my job, I can run:

sbatch --help | grep cpu

A template job script is available in templates/hello_slurm.sh. This shows the basic anatomy of a job on H4H. Please open the file now so you can follow along with details about each section of the script.

The Shebang statement
```
#!/bin/bash
```
- The first line of your script should always include a shebang (#!) statement for the langauge you are running, usually by specifying the path to the language runtime
- For H4H, generally you will be using bash to orcestrate execution of additional scripts and functions in other languages, as needed for your analysis
The Slurm header
```
#!/bin/bash
#SBATCH --job-name=hello_slurm
#SBATCH --output=hello_slurm.txt
#SBATCH --error=hello_slurm-%j.log
#SBATCH --time=0:01:00
```
- While it is possible to pass in sbatch options via the command line, this is a tedious and irreproducible way to configure your job
- As such, it is preferrable to codify your job configuration in the job script using special comment syntax called a Slurm header
- It should immediately follow the shebang statment (no empty lines) and uses the same conventions as the command line options but prepends them with the #SBATCH directive to tell Slurm these should be handled as command options instead of regular comments
- The Slurm header is read until the first empty line, as a result adding white space will break your header. You can instead add extra comment lines if you need to break up your header to increase readability
- To view a list of all possible options, run man sbatch or sbatch --help
The bash script
- This where the action happens, you can run arbitrary commands on the cluster here!
- Anything after the Slurm header and a blank line is considered the script to be executed by the job
- Use it to dispatch additional jobs, execute scripts in a variety of languages or script your task directly in the job file

Try it yourself🔗

Connect to the UHN VPN if you aren't already

Solution

sudo openconnect --protocol=gp -u 'uhnresearch/<username>' connect2.uhn.ca

SSH into the Login Node as described in Tutorial 1
Solution
```
ssh -p "$H4HLOGIN_PORT" "<username>@$H4HLOGIN"
```
Open a text editor with a file called hello_slurm.sh and paste in the contents of templates/hello_slurm.sh
Solution
```
nano hello_slurm.sh
# Paste with CTRL + v
# Save and exit with CTRL + x
```

Submit the job to run using sbatch

Solution

sbatch hello_slurm.sh  # returns the job ID

View the job in your queue via squeue, note the Job ID and Job name
Solution
```
squeue
# Job ID is first column
# Job name is second column
```
The job will wait for 60 seconds before printing "Hello Slurm!" to a text file in your home directory. View the contents of the file when it appears then view the log file specified in the Slurm header for additional execution information
Solution
```
ls hello_slurm* # check that the file has been created
cat hello_slurm.txt
cat hello_slurm-<job_id>.log
```
Have a look at your job history via the sacct function, note your job id.
Solution
```
sacct
```
Use the seff script and your job id from the previous step to view the efficiency of your slurm job
Solution
```
seff <job_id>
```

Resubmit the hello_slurm job and this time run sstat to view resources in real time.

Solution

sbatch hello_slurm.sh
squeue # check if the job has started
sstat # once the job has started, view the resouces being utilized

Cancel the resubmitted job (quickly it only runs for 60 seconds!)

Solution

scancel -n hello_slurm
squeue  # the job should no longer appear in the queue