For a general introduction to using SLURM, watch the video tutorial that BYU put together.
Here's a useful cheatsheet of many of the most common Slurm commands.
Example submission scripts are available at our Git repository.
https://bitbucket.org/caltechimss/central-hpc-public/src/master/slurm-scripts/
To pull down the extended example script, run the following from cluster login node.
wget https://bitbucket.org/caltechimss/central-hpc-public/raw/master/slurm-scripts/extended-slurm-submission
Job Submission
Use the Script Generator to check for syntax. Each #SBATCH line contains a parameter that you can use on the command-line (e.g. --time=1:00:00).
sbatch -A accounting_group your_batch_script
salloc is used to obtain a job allocation that can then be used for running within
srun is used to obtain a job allocation if needed and execute an application. It can also be used for distribute mpi processes in your job.
Environment Variables:
- SLURM_JOB_ID - job ID
- SLURM_SUBMIT_DIR - the directory you were in when sbatch was called
- SLURM_CPUS_ON_NODE - how many CPU cores were allocated on this node
- SLURM_JOB_NAME - the name given to the job
- SLURM_JOB_NODELIST - the list of nodes assigned. potentially useful for distributing tasks
- SLURM_JOB_NUMNODES -
- SLURM_NPROCS - total number of CPUs allocated
- --nodes - The number of nodes for the job (computers)
- --mem - The amount of memory per node that your job needs
- -n - The total number of tasks your job requires
- --gres gpu:# - The number of GPUs per node you need in your job
- --gres=gpu:type:# - You can also specify the type of gpu. We have mostly p100s, but also 2 v100s
- --qos - The QOS you want to run in, currently normal or debug
- --mem-per-cpu= - The amount of memory per cpu your job requires
- -N - The minimum (and maximum) number of nodes required
- --ntasks-per-node=# - tasks per node.
- --exclusive - this will get you exclusive use of the node
- --constraint= - constrain to particular nodes. use skylake, cascadelake, or broadwell for particular processor types
- --partition= -which partition ot send the job to. This should be expansion or gpu.
#SBATCH --nodes=1
#SBATCH --gres=gpu:2
#SBATCH --partition=gpu
#SBATCH --gres=gpu:v100:1
#SBATCH --partition=gpu
Request a single node with 1 V100 GPU. (Specifically, 32GB V100's. As the (4) 32GB V100 GPU's are on a cascade lake node, we need to constrain to that.)
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100:1
#SBATCH --constraint="cascadelake"
#SBATCH --partition=gpu
Request that your job only runs on skylake or cascadelake cpu's.
#SBATCH --constraint=skylake|cascadelake
Your job will be charged to the account specified. We do not force you to set an account since many users will be in just one. If you are in more than one group, make sure that you specify the group that you are wanting to charge the job to. This is done by using the "-A" option when submitting the job.
You can see the accounts you are in using:
sacctmgr show user myusername accounts
You can change you default account using:
sacctmgr modify user myusername set defaultaccount=account
Note: Please choose wisely while setting your jobs wall time. As cluster policy we do not typically increase a running jobs wall time as it is both unfair to other users and could alter the reported start times of existing jobs in the queue. If you are unfamiliar with your codes performance we strongly recommending padding the wall time specified then work backwards.
Job/Queue Management
squeue is used to show the queue. By default it shows all jobs, regardless of state:
- -l : long listing
- -u username : only show the jobs of the chosen user
- -A account : show jobs from a specifc group, usualyy a PI
- --state=pending : show pending jobs
- --state=running : show running jobs
scancel is used to cancel (i.e. kill) a job. Here are some options to use:
- jobib : kill the job with that jobid
- -u username : kill all jobs for the user
- --state=running : kill jobs that are in state "running"
- --state=pending: Kill jobs in state "pending"
you can stack these options to get a particular set of jobs. For example, "scancle -u foor --state=pending" will kill all penging jobs for user "foo"
scontrol show job is used to display job information for pending and running jobs. This displays information such as hold, resource requests, resource allocations, etc. This is agreat first step in chcking a job.
scontrol hold holds a job. Pass it a job ID (e.g. "scontrol hold 1234").
scontrol release releases a held job. Pass it a job ID (e.g. "scontrol release 1234").
Checking Usage
sreport -T gres/gpu,cpu cluster accountutilizationbyuser start=01/01/18T00:00:00 end=now -t hours account=<group-account-name>
sreport -T gres/gpu,cpu cluster accountutilizationbyuser start=01/01/18T00:00:00 end=now -t hours user=<username>
sacct shows current and historical job information in more detail that sreport. Important options:
- -S from_date: Show jobs that started on or after from_date. There are several valid formats, but the easiest is probably "MMDD". See "man sacct" for more options.
- -l ("l" for "long"): gives more verbose information
- -u someusername: limit output to jobs by someusername
- -A someprofessor: limit output to jobs by someprofessor's research group
- -j jobid: specify a particular job to examine
- -o format options: see "man sacct" for more fields to examine; there are a lot
sacct -u <username> -S 0101 --format JobId,AllocCPUs,UserCPU
Launching tasks within a job
MPI Jobs
mpirun
mpirun executable options
srun
srun executable options
Embarrassingly Parallel Jobs
Interactive Jobs
srun --pty -t hh:mm:ss -n tasks -N nodes /bin/bash -l
This is a good way to interactively debug your code or try new things. You can also specify specific resources you need in here such as GPUs or memory.
X11
You can also get an X11 application to run from a compute node through an allocation. To do this make sure that you have a xserver working on your local system. Make sure that you are forwarding X connections through your ssh connection (-X). To do this use the --x11 option to set up the forwarding:
srun --x11 -t hh:mm:ss -N 1 xterm
Keep in mind that this is likely to be slow and
the session will end if the ssh connection is terminated. A more robust solution is to use FastX. Click here for FastX tutorial.