Skip to main content


SLURM

Stampede uses SLURM as the batch queuing system and the scheduling mechanism. This means that jobs are submitted to SLURM from a login node and SLURM handles scheduling these jobs on nodes as resource becomes available.

description

The above image shows the functioning of a batch queuing system. Users submit jobs to the batch component which is responsible for maintaining 1+ queues (also known as "partitions", in SLURM parlance). These jobs include information about themselves as well as a set of resource requests. Resource requests include anything from the number of CPUs or nodes to specific node requirements (e.g. only use nodes with > 2GB RAM). A seperate component, called the scheduler, is responsible for figuring out when and where these jobs can be run on the cluster. The scheduler needs to take into account the priority of the job, any reservations that may exist, when currently running jobs are likely to end, etc. Once informed of scheduling information, the batch system will handle starting your job at the appropriate time and place. SLURM handles both of these components, so you don't have to think of them as seperate processes. You just need to know how to submit jobs to the batch queue(s).