Difference between revisions of "Slurm"
(→Job Control: Added a note about closing interactive sessions when complete) |
(reworked the table of basic arguments and made other corrections) |
||
Line 1: | Line 1: | ||
− | Some of the CAC's [[Private Clusters]] are managed with [https://github.com/openhpc/ohpc/wiki OpenHPC], which includes the [https://slurm.schedmd.com/ Slurm Workload Manager] (Slurm for short). Slurm (Simple Linux Utility for Resource Management) is a group of utilities used for managing workloads on compute clusters. | + | Some of the CAC's [[Private Clusters]] are managed with [https://github.com/openhpc/ohpc/wiki OpenHPC], which includes the [https://slurm.schedmd.com/ Slurm Workload Manager] (Slurm for short). Slurm (originally the Simple Linux Utility for Resource Management) is a group of utilities used for managing workloads on compute clusters. |
This page is intended to give users an overview of Slurm. Some of the information on this page has been adapted from the [https://cvw.cac.cornell.edu/ Cornell Virtual Workshop] topics on the [https://cvw.cac.cornell.edu/Environment Stampede2 Environment] and [https://cvw.cac.cornell.edu/slurm/ Advanced Slurm]. For a more in-depth tutorial, please review these topics directly. | This page is intended to give users an overview of Slurm. Some of the information on this page has been adapted from the [https://cvw.cac.cornell.edu/ Cornell Virtual Workshop] topics on the [https://cvw.cac.cornell.edu/Environment Stampede2 Environment] and [https://cvw.cac.cornell.edu/slurm/ Advanced Slurm]. For a more in-depth tutorial, please review these topics directly. | ||
+ | |||
+ | One important feature of CAC's typical Slurm configuration is that scheduling is done '''''by CPU''''', not by node. This means that by default, a node may be shared among multiple users. | ||
__TOC__ | __TOC__ | ||
Line 26: | Line 28: | ||
Here are some common Job Control commands: | Here are some common Job Control commands: | ||
:* <code>sbatch testjob.sh</code> submits a job where testjob.sh is the script you want to run. Also see the [[#Job Scripts|Job Scripts]] section and the [https://slurm.schedmd.com/sbatch.html sbatch] documentation. | :* <code>sbatch testjob.sh</code> submits a job where testjob.sh is the script you want to run. Also see the [[#Job Scripts|Job Scripts]] section and the [https://slurm.schedmd.com/sbatch.html sbatch] documentation. | ||
− | :* <code>srun -p <partition> --pty /bin/bash</code> starts an interactive job. | + | :* <code>srun -p <partition> --pty /bin/bash -l</code> starts an interactive job and opens a login shell for you to enter commands into. Also see the [https://slurm.schedmd.com/srun.html srun] documentation. |
+ | :**'''Note:''' remember to exit the session once you are done to free resources for other users. | ||
:* <code>squeue -u my_userid</code> shows state of jobs for user my_userid. Also see the [https://slurm.schedmd.com/squeue.html squeue] documentation. | :* <code>squeue -u my_userid</code> shows state of jobs for user my_userid. Also see the [https://slurm.schedmd.com/squeue.html squeue] documentation. | ||
:* <code>scontrol show job <job id></code> views the state of a job. Also see the [https://slurm.schedmd.com/scontrol.html scontrol] documentation. | :* <code>scontrol show job <job id></code> views the state of a job. Also see the [https://slurm.schedmd.com/scontrol.html scontrol] documentation. | ||
Line 34: | Line 37: | ||
Once the job has completed, the stdout and stderr streams will be put in your $HOME directory named with the job id. To verify the job ran successfully, examine these output files. | Once the job has completed, the stdout and stderr streams will be put in your $HOME directory named with the job id. To verify the job ran successfully, examine these output files. | ||
− | ==== | + | ==== Useful Arguments ==== |
− | The following table shows | + | The following table shows key directives that may be provided with each job. Some systems may require one or more of these arguments to be supplied. |
:{| class="wikitable" border="1" cellpadding="5" style="width: auto" | :{| class="wikitable" border="1" cellpadding="5" style="width: auto" | ||
! style="background:#e9e9e9;" | Meaning | ! style="background:#e9e9e9;" | Meaning | ||
! style="background:#e9e9e9;" | Flag | ! style="background:#e9e9e9;" | Flag | ||
− | ! style="background:#e9e9e9;" | Value | + | ! style="background:#e9e9e9;" | Allowed Value |
! style="background:#e9e9e9;" | Example | ! style="background:#e9e9e9;" | Example | ||
+ | ! style="background:#e9e9e9;" | Default | ||
+ | |- | ||
+ | | Submission queue | ||
+ | | align="center" | <tt>-p</tt> | ||
+ | | align="center" | Queue/partition name<br />''(valid names are cluster dependent)'' | ||
+ | | align="center" | <tt>-p normal</tt> | ||
+ | | align="center" | (cluster dependent) | ||
|- | |- | ||
− | | Job | + | | Job walltime |
| align="center" | <tt>-t</tt> | | align="center" | <tt>-t</tt> | ||
− | | align="center" | hh:mm:ss | + | | align="center" | hh:mm:ss<br />''(not to exceed time limit of queue) |
| align="center" | <tt>-t 00:05:00</tt> | | align="center" | <tt>-t 00:05:00</tt> | ||
− | |- | + | | align="center" | time limit of queue |
− | | Number of tasks | + | |- |
+ | | Number of tasks | ||
| align="center" | <tt>-n</tt> | | align="center" | <tt>-n</tt> | ||
− | | align="center" | 1 ... ( | + | | align="center" | 1 ... number of CPUs on ''N'' nodes<br />''(Slurm calculates N, if ''<tt>-N</tt>'' is not present)'' |
| align="center" | <tt>-n 16</tt> | | align="center" | <tt>-n 16</tt> | ||
+ | | align="center" | 2*''N''<br />''(2, if ''<tt>-N</tt>'' is also not present)'' | ||
|- | |- | ||
− | | | + | | Number of nodes |
− | | align="center" | <tt>- | + | | align="center" | <tt>-N</tt> |
− | | align="center" | | + | | align="center" | 1 ... ''NP'' = number of nodes in partition<br />''(if N > NP, job is queued but never runs)'' |
− | | align="center" | <tt>- | + | | align="center" | <tt>-N 2</tt> |
+ | | align="center" | enough to satisfy <tt>-n</tt><br />''(1, if ''<tt>-n</tt>'' is also not present)'' | ||
|- | |- | ||
|} | |} | ||
− | + | Note that the maximum number of tasks that can be accommodated in a given queue is hardware dependent. In the absence of any other information, Slurm allocates the minimum number of nodes necessary to accommodate <code>-n</code> number of tasks with each task occupying one CPU. | |
+ | |||
+ | It is important to understand that Slurm counts ''each physical core'' of a multi-core processor as ''two CPUs''. This is due to Intel's hyperthreading technology, which makes each physical core appear to be two ''hardware threads'' to the OS. Accordingly, Slurm calculates the total number of CPUs per node as follows: | ||
+ | |||
+ | CPUs/node = (boards/node) * (sockets/board) * (cores/socket) * (hardware threads/core) | ||
+ | = 1 * 2 * (cores/socket) * 2 | ||
+ | |||
+ | Most of the above factors are fixed, because nearly all CAC clusters consist of ''dual-socket nodes'': each of the 2 sockets in the node holds one Intel multi-core processor. The number of cores per processor can vary quite a lot, even within a single queue. Check your cluster's documentation for details. | ||
+ | |||
+ | It is possible to assign fewer process to each node via the -N argument (see the [[#Optional Arguments|Optional Arguments]] section), which specifies the desired number of nodes. | ||
==== Example Command-line Job Submission ==== | ==== Example Command-line Job Submission ==== | ||
Line 90: | Line 112: | ||
! style="background:#e9e9e9;" | Example | ! style="background:#e9e9e9;" | Example | ||
|- | |- | ||
− | |||
− | |||
− | |||
− | |||
− | |||
| Name of Job | | Name of Job | ||
| align="center" | <tt>-J</tt> | | align="center" | <tt>-J</tt> |
Revision as of 14:36, 18 June 2019
Some of the CAC's Private Clusters are managed with OpenHPC, which includes the Slurm Workload Manager (Slurm for short). Slurm (originally the Simple Linux Utility for Resource Management) is a group of utilities used for managing workloads on compute clusters.
This page is intended to give users an overview of Slurm. Some of the information on this page has been adapted from the Cornell Virtual Workshop topics on the Stampede2 Environment and Advanced Slurm. For a more in-depth tutorial, please review these topics directly.
One important feature of CAC's typical Slurm configuration is that scheduling is done by CPU, not by node. This means that by default, a node may be shared among multiple users.
Overview
Some clusters use Slurm as the batch queuing system and the scheduling mechanism. This means that jobs are submitted to Slurm from a login node and Slurm handles scheduling these jobs on nodes as resources becomes available. Users submit jobs to the batch component which is responsible for maintaining one or more queues (also known as "partitions"). These jobs include information about themselves as well as a set of resource requests. Resource requests include anything from the number of CPUs or nodes to specific node requirements (e.g. only use nodes with > 2GB RAM). A separate component, called the scheduler, is responsible for figuring out when and where these jobs can be run on the cluster. The scheduler needs to take into account the priority of the job, any reservations that may exist, when currently running jobs are likely to end, etc. Once informed of scheduling information, the batch system will handle starting your job at the appropriate time and place. Slurm handles both of these components, so you don't have to think of them as separate processes, you just need to know how to submit jobs to the batch queue(s).
Note: Refer to the documentation for your cluster to determine what queues/partitions are available.
Running Jobs
This section covers general job submission and job script composition; for more specific details on how to run jobs or job scripts and use queues on your particular system, see the documentation for the Private Cluster you are working on. Also note, many of the following commands have several options. For full details, see the man page for the command, or the Slurm Docs.
Display Info
Common commands used to display information:
sinfo
displays information about nodes and partitions/queues. Use-l
for more detailed information.scontrol show nodes
views the state of the nodes.scontrol show partition
views the state of the partition/queue.
Job Control
Here are some common Job Control commands:
sbatch testjob.sh
submits a job where testjob.sh is the script you want to run. Also see the Job Scripts section and the sbatch documentation.srun -p <partition> --pty /bin/bash -l
starts an interactive job and opens a login shell for you to enter commands into. Also see the srun documentation.- Note: remember to exit the session once you are done to free resources for other users.
squeue -u my_userid
shows state of jobs for user my_userid. Also see the squeue documentation.scontrol show job <job id>
views the state of a job. Also see the scontrol documentation.scancel <job id>
cancels a job. Also see the scancel documentation.squeue
with no arguments retrieves summary information on all jobs scheduled.
Once the job has completed, the stdout and stderr streams will be put in your $HOME directory named with the job id. To verify the job ran successfully, examine these output files.
Useful Arguments
The following table shows key directives that may be provided with each job. Some systems may require one or more of these arguments to be supplied.
Meaning Flag Allowed Value Example Default Submission queue -p Queue/partition name
(valid names are cluster dependent)-p normal (cluster dependent) Job walltime -t hh:mm:ss
(not to exceed time limit of queue)-t 00:05:00 time limit of queue Number of tasks -n 1 ... number of CPUs on N nodes
(Slurm calculates N, if -N is not present)-n 16 2*N
(2, if -N is also not present)Number of nodes -N 1 ... NP = number of nodes in partition
(if N > NP, job is queued but never runs)-N 2 enough to satisfy -n
(1, if -n is also not present)
Note that the maximum number of tasks that can be accommodated in a given queue is hardware dependent. In the absence of any other information, Slurm allocates the minimum number of nodes necessary to accommodate -n
number of tasks with each task occupying one CPU.
It is important to understand that Slurm counts each physical core of a multi-core processor as two CPUs. This is due to Intel's hyperthreading technology, which makes each physical core appear to be two hardware threads to the OS. Accordingly, Slurm calculates the total number of CPUs per node as follows:
CPUs/node = (boards/node) * (sockets/board) * (cores/socket) * (hardware threads/core) = 1 * 2 * (cores/socket) * 2
Most of the above factors are fixed, because nearly all CAC clusters consist of dual-socket nodes: each of the 2 sockets in the node holds one Intel multi-core processor. The number of cores per processor can vary quite a lot, even within a single queue. Check your cluster's documentation for details.
It is possible to assign fewer process to each node via the -N argument (see the Optional Arguments section), which specifies the desired number of nodes.
Example Command-line Job Submission
All of the required options can be specified on the command-line with sbatch
. For example, say you had the following script "simple_cmd.sh" to run:
#!/bin/bash #Ensures that the node can sleep #print date date #verify that sleep 5 works time sleep 5
In order to run this on the command-line, you could issue (where development is an available queue on the system):
$ sbatch -p development -t 00:01:00 -n 1 simple_cmd.sh
There is also an easier way, as demonstrated in Job Scripts.
Optional Arguments
Meaning Flag Value Example Name of Job -J any string -J SimpleJob Stdout -o absolute path -o $HOME/project1/%j.out Sterr -e absolute path -e $HOME/project1/%j.err Job Dependency -d type:job_id -d=after:1234 Email address --mail-user email@domain --mail-user=genius@gmail.com Email notification type --mail-type BEGIN, END, FAIL, REQUEUE, or ALL --mail-type=ALL Specify Environment Variable --export varname=varvalue, ALL (default), NONE --export LOC=$SCRATCH/x/foo.cdf
Of particular importance to those who run MPI applications, the -N
and -n
arguments may be used together in order to determine the number of processes running on each node. Running fewer processes on a given node means that each process can safely consume a greater percentage of a node's memory.
If -N
is specified along with -n
, Slurm will allocate the number of nodes implied by -N
, then evenly divide the total number of processes specified with -n
amongst the nodes. For example, -N 8 -n 32
will specify 32 processes to be launched, divided evenly among the 8 allocated nodes (i.e. 32 processes ÷ 8 nodes = 4 processes per node).
Job Scripts
For more specific examples on how to write job scripts and use queues on your particular system, see the documentation for the Private Cluster you are working on.
Simple Job Script
For the same example from above, the same commands can be put into the batch script itself. This makes it easy to copy and paste to new scripts as well as be confident that a job is submitted the same way over and over again. We'll modify the previous script so that it includes all of the required directives.
All that is required is to place the command line options in the batch script and prepend them with #SBATCH. They appear as comments to the shell, but Slurm parses them for you and applies them. Here is the end result:
#!/bin/bash #Ensures that the node can sleep #SBATCH -t 00:05:00 #SBATCH -n 1 #SBATCH -p development #print date date #verify that sleep 5 works time sleep 5
References
- Slurm Docs
- Quick Start User Guide - this page lists all of the available Slurm commands
- Command/Option Summary (two page PDF)
- Frequently Asked Questions includes FAQs for Management, Users and Administrators
- Convenient Slurm Commands has examples for getting information on jobs and controlling jobs