Child pages
  • Slurm Basic Commands
Skip to end of metadata
Go to start of metadata

The Slurm Workload Manager, or more simply Slurm, is what Resource Computing uses for scheduling jobs on our cluster SPORC and the Ocho. Slurm makes allocating resources and keeping tabs on the progress of your jobs easy. This documentation will cover some of the basic commands you will need to know to start running your jobs. 

To run jobs you need to connect to sporcsubmit.rc.rit.edu using either SSH of FastX.

sinfo

Reports the state of the partitions and nodes managed by Slurm. 

[abc1234@sporcsubmit ~]$ sinfo

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST

tier 1          up    10-00:00:0      1  down*  skl-a-08
tier 1          up    10-00:00:0      1    mix  skl-a-60
tier 1          up    10-00:00:0     12  alloc  skl-a-[01-04,07,09-15]
tier 1          up    10-00:00:0     20   idle  skl-a-[05-06,16-32,61]
tier 2          up    10-00:00:0      1  down*  skl-a-08
...
onboard         up    10-00:00:0     27   idle  skl-a-[33-59]
interactive     up    2-00:00:00      1    mix  theocho


  • PARTITION: the name of the partition
  • AVAIL: whether the partition is up or down
  • TIMELIMIT: the maximum length a job will will run in the format Days-Hours:Minutes:Seconds
  • NODES: the number of nodes of that configuration
  • STATE: down* if jobs cannot be ran, idle if it is are available for jobs, alloc if all the CPUs in the partition are allocated to jobs, or mix if some CPUs on the nodes are allocated and others are idle.
  • NODELIST: specific nodes associated with that partition.

sbatch

Submits a script to Slurm so a job can scheduled. A job will wait in pending until the allocated resources for the job are available. 

Every script you submit to Slurm through sbatch should have the following options specified in the file:

Basic sbatch Options.
#SBATCH -J <jobName>
Gives the job a name that will make it easy to find when you run squeue or sacct.
#SBATCH -t Days-Hours:Minutes:Seconds

Sets the time limit for the job. Other acceptable time formats include:

  • Minutes
  • Minutes:Seconds
  • Hours:Minutes:Seconds
  • Days-Hours
  • Days-Hours:Minutes
#SBATCH -p <partition>

Specifies which partition to run your job on. Choices includes:

  • tier1
  • tier2
  • tier3
  • onboard
  • debug

Run my-accounts to see which partitions you can run jobs on.

#SBATCH -A <accountName>
Specifies which account to run the job under. Run my-accounts to see your accounts.
#SBATCH --mem=<size[units]>
How much memory your job will need. Units include: K, M, G, and T. If no unit is specified M will be used. If your job runs out of memory, just increase size.
#SBATCH -o <filename.o>
Where the output of your job is stored. End the file name with a .o so it is easy to find.
#SBATCH -e <filename.e>
Same as -o, but for error.
#SBATCH --mail-user=<email>
Email address to mail any notifications to for the job
#SBATCH --mail-type=<type>
Under what circumstances you want to be emailed about. Type could be BEGIN for when the job starts, END for when the job ends, FAIL for if the job fails, of ALL for all the above.
sbatch Options for Resources
#SBATCH -n <number>
The number of tasks your job will generate. Specifying this tells Slurm how many cores you will need. By default 1 core is used per task; use -c to change this value.
#SBATCH -c <ncpus>
Specifies number of CPUs needed for each task. For example, if you have 4 tasks that use 20 cores each, you would get a total of 80 cores. So, you would use #SBATCH -c 20. This option makes sure that those 20 cores for each task are on the same node.
#SBATCH --gres=gpu[:type:count]
For when your job needs to run on a GPU. See the documentation on --gres here.

Example Bash Script

The following is the example script slurm-mpi.sh. This can be found by running grab-examples when you log into SPORC. 

#!/bin/bash -l
# NOTE the -l flag!
# This is an example job file for a multi-core MPI job.
# Note that all of the following statements below that begin
# with #SBATCH are actually commands to the SLURM scheduler
# Please copy this file to your home directory and modify it 
# to suit your needs.
#
# If you need any help,please email rc-help@rit.edu
#
# Name of the job - You'll probably want to customize this.
#SBATCH -J mpi_test
# Standard out and Standard Error output files
#SBATCH -o mpi_test.o
#SBATCH -e mpi_test.e
# To send emails, set the address below and remove one of the '#' sings
##SBATCH --mail-user=<email>
# notify on state change: BEGIN, END, FAIL, OR ALL
# 5 days is the run time MAX, anything over will be KILLED unless you talk with RC
# Request 4 days and 5 hours
#SBATCH -t 4-5:0:0
#Put the job in the appropriate partition matching the account and request FOUR cours
#SBATCH - A <account_name> -p <onboard, tier1, tier2, tier3> -n 4
#Job membory requirements in MB=m (default), GB=g, or TB=t
#SBATCH --mem=3g
#
# Your job script goes below this line.
#
echo "(${HOSTNAME}) sleeping for 1 minute to simulate work (ish)"
echo "(${HOSTNAME}) even though this script as claimed for cores"
echo "(${HOSTNAME}) ... it won't be using all "
sleep 60
echo "(${HOSTNAME}) Ahhh, alarm clock!"

Running sbatch 

[abc1234@sporcsubmit ~]$ sbatch slurm-mpi.sh
Submitted batch job 2914
  • If no filename is specified, then sbatch will read from the command line
  • The number after job is the job_id
  • See squeue and sacct for how to check the progress of the job

See Using the Cluster - Advanced Usage for topics such as loops and dependent jobs. Some documentation will also give you example bash scripts for your specific program.

srun

srun is used for jobs that require MPI. It schedules your job to be ran on the Slurm Scheduler similar to sbatch. To use simply create an sbatch file like the example above and add srun ./<mpi_program> below the sbatch commands. Then run the sbatch file as you normally would.

sinteractive

If you need user interaction or are only running something once then run `sinteractive`. This will ask you for the resources you require and then connect you to the scheduled node. If you don't know what that entails, just try it. Be sure to exit from your sinteractive session by running exit when you're done, otherwise you're a terrible person for requesting resources you aren't using. For the full process, see our documentation

squeue

Lists the state of all jobs being run or scheduled to run. 

[abc1234@sporcsubmit ~]$ squeue

 JOBID  PARTITION       NAME      USER   ST             TIME   NODES  NODELIST(REASON)

2714_1     tier3     myjob abc1234 PD       0:00       1 (JobHeldAdmin)
2714_2     tier3     myjob abc1234 PD       0:00       1 (JobHeldAdmin)
...
   384     tier1   new_job def5678  R 2-09:14:40       1 skl-a-18
  1492 interacti  _interac aaa0000  R    1:24:23       1 theocho


  • JOBID: number id associated with the job
  • PARTITION: name of partition running the job
  • NAME: name of the job ran with sbatch or sinteractive
  • USER: who ordered the job to be ran
  • ST: State of the job, PD for pending, R for running
  • TIME: how long the job has been running in the format Days-Hours:Minutes:Seconds 
  • NODES: number of nodes allocated to the job
  • NODELIST(REASON): either the name of the node running the job of the reason the job is not running such as JobHeldAdmin (job is prevented from running by the administrator). Other reasons and their explanations can be found in the official Slurm documentation for squeue.
  • Use squeue -u username to view only the jobs from a specific user

scancel

Signals or cancels a job. One or more jobs separated by spaces may be specified.

[abc1234@sporcsubmit ~]$ scancel job_id[_array_id]

sacct

Lists the jobs that are running or have been run.

[abc1234@sporcsubmit ~]$ sacct
      JobID      JobName    Partition      Account    AllocCPUS        State    ExitCode
-----------    ---------    ---------   ----------    ---------    ---------    --------
2912           job_tests        tier3   job_tester            2    COMPLETED         0:0
2912.batch         batch                job_tester            2    COMPLETED         0:0
2912.extern       extern                job_tester            2    COMPLETED         0:0
2913               jobs2        tier3   job_tester            1       FAILED         1:0
2913.batch         batch                job_tester            1       FAILED         1:0
2913.extern       extern                job_tester            1    COMPLETED         0:0


  • sacct -j <jobName> will display only the one or more jobs listed
  • sacct -A <accountName> will display only the jobs ran by the one or more comma separated accounts
  • Failed jobs will have an exit code other than 0. 1 is used for general failures. Some exit codes have special meanings which can be looked up online

my-accounts

Although not apart of Slurm my-accounts allows you to see all the accounts associated with your username which is helpful when you want to charge resource allocation to certain accounts. 

[abc1234@sporcsubmit ~]$ my-accounts
  Account Name      Expired  QOS          Allowed Partitions
- ------------      -------  ---          ------------------
* my_acct           false    qos_tier3    tier3,debug,interactive


If there are any further questions, or there is an issue with the documentation, please contact rc-help@rit.edu for additional assistance.