The Slurm Workload Manager, or more simply Slurm, is what Resource Computing uses for scheduling jobs on our cluster SPORC and the Ocho. Slurm makes allocating resources and keeping tabs on the progress of your jobs easy. This documentation will cover some of the basic commands you will need to know to start running your jobs.
To run jobs you need to connect to sporcsubmit.rc.rit.edu using either SSH of FastX.
Commands Overview
**All commands have a --help option available which will describe how to use the commands more in-depth and all the options available for the command.
sinfo
Reports the state of the partitions and nodes managed by Slurm.
[abc1234@sporcsubmit ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
tier 1 up 10-00:00:0 1 down* skl-a-08
tier 1 up 10-00:00:0 1 mix skl-a-60
tier 1 up 10-00:00:0 12 alloc skl-a-[01-04,07,09-15]
tier 1 up 10-00:00:0 20 idle skl-a-[05-06,16-32,61]
tier 2 up 10-00:00:0 1 down* skl-a-08
...
onboard up 10-00:00:0 27 idle skl-a-[33-59]
interactive up 2-00:00:00 1 mix theocho
PARTITION:the name of the partitionAVAIL:whether the partition is up or downTIMELIMIT:the maximum length a job will will run in the format Days-Hours:Minutes:SecondsNODES: the number of nodes of that configurationSTATE: down*if jobs cannot be ran,idleif it is are available for jobs,allocif all the CPUs in the partition are allocated to jobs, ormixif some CPUs on the nodes are allocated and others are idle.NODELIST:specific nodes associated with that partition.
sbatch
Submits a script to Slurm so a job can scheduled. A job will wait in pending until the allocated resources for the job are available.
[abc1234@sporcsubmit ~]$ sbatch myscript.sh
Submitted batch job 2914
- If no filename is specified, then sbatch will read from the command line
- The number after job is the job_id
- In your script file you can specify options with #SBATCH [option]. For example:
#SBATCH -J job_namewill specify the name of the job#SBATCH -t Days-Hours:Minutes:Secondswill set the time limit. Other acceptable time formats include: Minutes, Minutes:Seconds, Hours:Minutes:Seconds, Days-Hours, and Days-Hours:Minutes.#SBATCH -p paritition -c cpuspertask#will specify which partition to run the job on as well as the number of processors to use for each task.- #SBATCH -A accountName changes the account the job is ran under
- There are many more options see the official sbatch Slurm documentation.
Example Bash Script:
This is modified slurm-single-core.sh which can be found by running grab-examples when logged on SPORC. The only difference between this example and the version that comes with grab-examples is that this one uses the partition tier3 instead of work. Since the creation of grab-examples we've changed the names of our partitions so always sub tier3 for work.
#!/bin/bash -l
#NOTE the -l flag!
# This is an example job file for a single core CPU bound program
# Note that all of the following statements below that begin
# with #SBATCH are actually commands to the Slurm scheduler
# Please copy this file to your home directory and modify it
# to suit your needs.
#Name of the job -You'll probably want to customize this
#SBATCH -J test
#Standard out and Standard Error output files
#SBATCH -o test.output
#SBATCH -e test.output
#To send mail for updates on the job
#SBATCH --mail-user abc1234@rit.edu
#notify state changes: BEGIN, END, FAIL, or ALL
#SBATCH --mail-type=ALL
#Request 5 minutes run time MAX, anything over will be KILLED
#SBATCH -t 0:5:0
#Put the job in the "debug" partition and request one core
# "debug" is a limited partition. You'll likely want to change
# it to "tier3" once you understand how this all works.
#SBATCH -p debug -c 1
#Job memory requirements in MB
#SBATCH --mem=300
#Job script goes below this line
#
echo " (${HOSTNAME}) sleeping for 1 minute to simulate work(ish)"sleep 60
echo " *(${HOSTNAME}) Ahhh, alarm clock!"See Using the Cluster - Advanced Usage for topics such as loops and dependent jobs. Some documentation will also give you example bash scripts for your specific program.
squeue
Lists the state of all jobs being run or scheduled to run.
[abc1234@sporcsubmit ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2714_1 tier3 myjob abc1234 PD 0:00 1 (JobHeldAdmin)
2714_2 tier3 myjob abc1234 PD 0:00 1 (JobHeldAdmin)
...
384 tier1 new_job def5678 R 2-09:14:40 1 skl-a-18
1492 interacti _interac aaa0000 R 1:24:23 1 theocho
JOBID: number id associated with the jobPARTITION: name of partition running the jobNAME: name of the job ran with sbatch or sinteractiveUSER: who ordered the job to be ranST: State of the job, PD for pending, R for runningTIME: how long the job has been running in the format Days-Hours:Minutes:SecondsNODES: number of nodes allocated to the jobNODELIST(REASON): either the name of the node running the job of the reason the job is not running such as JobHeldAdmin (job is prevented from running by the administrator). Other reasons and their explanations can be found in the official Slurm documentation for squeue.- Use
squeue -u usernameto view only the jobs from a specific user
scancel
Signals or cancels a job. One or more jobs separated by spaces may be specified.
[abc1234@sporcsubmit ~]$ scancel job_id[_array_id]
sacct
Lists the jobs that are running or have been run.
[abc1234@sporcsubmit ~]$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
----------- --------- --------- ---------- --------- --------- --------
2912 job_tests tier3 job_tester 2 COMPLETED 0:0
2912.batch batch job_tester 2 COMPLETED 0:0
2912.extern extern job_tester 2 COMPLETED 0:0
2913 jobs2 tier3 job_tester 1 FAILED 1:0
2913.batch batch job_tester 1 FAILED 1:0
2913.extern extern job_tester 1 COMPLETED 0:0
sacct -j <jobName>will display only the one or more jobs listedsacct -A <accountName>will display only the jobs ran by the one or more comma separated accounts- Failed jobs will have an exit code other than 0. 1 is used for general failures. Some exit codes have special meanings which can be looked up online
my-accounts
Although not apart of Slurm my-accounts allows you to see all the accounts associated with your username which is helpful when you want to charge resource allocation to certain accounts.
[abc1234@sporcsubmit ~]$ my-accounts
Account Name Expired Allowed Partitions
- ------------ ------- ------------------
* my_acct false tier3,debug,interactive
If there are any further questions, or there is an issue with the documentation, please contact rc-help@rit.edu for additional assistance.