Running Jobs on the ASTRA cluster

From CAC Documentation wiki
Revision as of 16:56, 7 November 2018 by Jhs43 (talk | contribs) (→‎Setting Up your Job Submission Batch File)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

After you have familiarized yourself with the 'Getting Started':

Maui Scheduler and Job submission/monitoring commands

Jobs are scheduled by the Maui scheduler with the Torque resource manager. We suggest you use a job submission batch file utilizing PBS Directives ('Options' section).

Common Maui Commands

(If you have any experience with PBS/Torque or SGE, Maui Commands may be recognizable. Most used:

qsub - Job submission (jobid will be displayed for the job submitted)

  • $ qsub

showq - Display queue information.

  • $ showq (dump everything)
  • $ showq -r (show running jobs)
  • $ showq -u foo42 (shows foo42's jobs)

checkjob - Display job information. (You can only checkjob your own jobs.)

  • $ checkjob -A jobid (get dense key-value pair information on job 42)
  • $ checkjob -v jobid (get verbose information on job 42)

canceljob - Cancel Job. (You can only cancel your own jobs.)

  • $ canceljob jobid

Setting Up your Job Submission Batch File

Commands can be run on the command line with the qsub command. However, we suggest running your jobs from a batch script. PBS Directives are command line arguments inserted at the top of the batch script, each directive prepended with '#PBS' (no spaces). Reference PBS Directives

The following script example is requesting: 1 node, max time for job to be 5 minutes,30 seconds; output and error to be joined into the same file, descript name of job 'defaulttest' and use the 'default' queue:

#PBS -l walltime=00:05:30,nodes=1
#PBS -j oe
#PBS -N defaulttest
#PBS -q default

# Turn on echo of shell commands
set -x
# jobs normally start in the HOME directory; cd to where you submitted.
# copy the binary and data files to a local directory on the node job is executing on
cp $HOME/binaryfile $TMPDIR/binaryfile
cp $HOME/values $TMPDIR/values
# run your executable from the local disk on the node the job was placed on
./binaryfile >&binary.stdout
# Copy output files to your output folder	
cp -f $TMPDIR/binary.stdout $HOME/outputdir

A job with heavy I/O (use /tmp as noted in the above example)

  • Use /tmp to avoid heavy I/O over NFS to your home directory!
  • Ignoring this message could bring down the ASTRA CLUSTER HEAD NODE!

Request job to run on 5 nodes, 4 processes per node, 20 total processes

#PBS -l walltime=00:05:30,nodes=5:ppn=4

Additional jobs submitted by you or others can run on the unused slots (up to 12) on the same nodes. This may not be good if your job uses lots of memory.

Request job to run exclusively on 2 nodes, 3 processes per node, 6 total processes

#PBS -l nodes=2:ppn=12
#PBS -l walltime=00:02:00
#PBS -q test
#PBS -j oe
#PBS -N 2nodes6tasks


# Construct a copy of the hostfile with only TPN entries per node.
# MPI can use this to run TPN tasks on each node.
uniq "$PBS_NODEFILE" | awk -v TPN="$TPN" '{for(i=0;i<TPN;i+=1) print}' > nodefile."$TPN"way

mpiexec --hostfile nodefile."$TPN"way --tag-output hostname

To have exclusive access to nodes, your batch script must request all 12 slots per node. (ASTRA nodes have 12 cores each.) But let's say you want your MPI job to launch only a few tasks on each node, perhaps because each task uses a lot of memory. In that case, you cannot use the default $PBS_NODEFILE, as it lists each node 12 times. The above script creates a replacement hostfile listing each node $TPN times instead.

Request memory resources

To dedicate 2GB of memory for your job, add the 'mem=xx' to the '-l' option you should already have in your batch script file:

#PBS -l walltime=00:05:30,nodes=1,mem=2gb

Use 'checkjob -v [jobid]' to display your resources:
Total Tasks: 2
Dedicated Resources Per Task: PROCS: 1  MEM: 2048M

To have 2 tasks w/ each proc dedicated to have 2GB, you would need your PBS directive:

#PBS -l walltime=00:05:30,nodes=1:ppn=2,mem=4gb

checkjob -v [jobid]:
Total Tasks: 2
Dedicated Resources Per Task: PROCS: 1  MEM: 2048M

Running a job on a specific nodes


#PBS -l walltime=00:03:00,host=compute-1-1+compute-1-2+compute-1-3,nodes=3
#PBS -N testnodes
#PBS -j oe
#PBS -q default

set -x
echo "Run my job."

Running an interactive job

  • Be sure to have logged in w/ X11 forwarding enabled (if using ssh, use ssh -X or ssh -Y; if using putting, you need to be sure to check the X11 forwarding box)
  • You do not have to specify a specific node as below

Running an interactive job on a specific node

To run on a specific node use the '-l' with the'host='option; to run Interactively, use the '-I' (capital I) option. Example below is requesting you get the node for 5 days:

#PBS -l host=compute-1-5, walltime=120:10:00       
#PBS -N test
#PBS -j oe
#PBS -q default

set -x

Running multiple copies of your job

In order to run separate instances of the same program, you can use the scheduler's task array feature, through the "-t" option. array_request

  • On ASTRA, the "nodes" parameter in a batch job's resource list refers to cores, not nodes. Thus, separate tasks specified by -t will pile onto a single batch machine, one-per-core, then move on to the next machine, etc. (This may or may not be what you want.)
  • WARNING: We have discovered that running a job with more than 3500 tasks will bring down the Maui open source scheduler.
  • BE NICE TO YOUR PEERS who are also trying to run on this cluster. It has an "honor system" for the users. An excellent example of niceness is to submit your array request with "an optional slot limit" to limit the amount of jobs that can run concurrently in the job array. The default value is unlimited. The slot limit must be the last thing specified in the array_request and is delimited from the array by a percent sign (%).
  • The example below requests 30 tasks, yet it runs only 12 at a time (enough to fill just one batch machine). You will see the other 18 go into the 'IDLE JOBS' category when using 'showq'. These jobs will start as soon as one or more of the first 12 have completed, assuming CPUs are available.
#PBS -l nodes=1,walltime=10:00   (note: this is PBS -l (small case L))
#PBS -t 1-30%12
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
echo Run my job.
  • Otherwise, if you do not have a need to limit the number of jobs, you simply ask for 30 tasks by putting the #PBS -t directive in your script, or by using the command line option. The numeric argument to -t is an integer id or a range of integers. The range does not have to start with 1, and multiple ranges and individual ids can be specified.
  • #PBS -t 30-60
  • qsub -t 1-20,25,50-60 job_script

For more ideas on how to make use of job arrays, see Multiple Batch Tasks.

This method is recommended over using a for/while loop within a bash script. We have seen this "confuse the scheduler".

Further examples:

The scheduler on astra is similar to that of the CAC v4 scheduler with a few distinct changes:

  • There is no need to specify an account number in batch scripts.
  • The command to submit jobs is qsub, not nsub.
  • The v4 scheduler allows only one job per node.

Examples from the CAC v4 scheduler

Astra Documentation Main Page