Running Jobs on the ASTRA cluster
After you have familiarized yourself with the 'Getting Started':
Maui Scheduler and Job submission/monitoring commands
Common Maui Commands
(If you have any experience with PBS/Torque or SGE, Maui Commands may be recognizable. Most used:
'qsub - Job submission (jobid will be displayed for the job submitted)
- $ qsub jobscript.sh
showq - Display queue information.
- $ showq (dump everything)
- $ showq -r (show running jobs)
- $ showq -u foo42 (shows foo42's jobs)
checkjob - Display job information. (You can only checkjob your own jobs.)
- $ checkjob -A jobid (get dense key-value pair information on job 42)
- $ checkjob -v jobid (get verbose information on job 42)
canceljob - Cancel Job. (You can only cancel your own jobs.)
- $ canceljob jobid
Setting Up your Job Submission Batch File
Commands can be run on the command line with the qsub command. However, we suggest running your jobs from a batch script. PBS Directives are command line arguments inserted at the top of the batch script, each directive prepended with '#PBS' (no spaces). Reference PBS Directives
The following script example is requesting: 1 node, max time for job to be 5 minutes,30 seconds; output and error to be joined into the same file, descript name of job 'defaulttest' and use the 'default' queue:
#!/bin/bash #PBS -l walltime=00:05:30,nodes=1 #PBS -j oe #PBS -N defaulttest #PBS -q default # Turn on echo of shell commands set -x # jobs normally start in the HOME directory; cd to where you submitted. cd $PBS_O_WORKDIR # copy the binary and data files to a local directory on the node job is executing on cp $HOME/binaryfile $TMPDIR/binaryfile cp $HOME/values $TMPDIR/values cd $TMPDIR # run your executable from the local disk on the node the job was placed on ./binaryfile >&binary.stdout # Copy output files to your output folder cp -f $TMPDIR/binary.stdout $HOME/outputdir
A job with heavy I/O (use /tmp as noted in the above example)
- Use /tmp to avoid heavy I/O over NFS to your home directory!
- Ignoring this message could bring down the ASTRA CLUSTER HEAD NODE!
Running Many Copies of a Serial Job
In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.
#!/bin/sh #PBS -l nodes=1,walltime=10:00 (note: this is PBS -l (small case L)) #PBS -t 30 #PBS -N test #PBS -j oe #PBS -S /bin/bash set -x cd "$PBS_O_WORKDIR" echo Run my job.
When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.
Request job to run on 4 processors, on 5 nodes
#PBS -l walltime=00:05:30,nodes=5:ppn=4
Request Memory resources
To dedicate 2GB of memory for your job, add the 'mem=xx' to the '-l' option you should already have in your batch script file:
#PBS -l walltime=00:05:30,nodes=1,mem=2gb
Use 'checkjob -v [jobid]' to display your resources: Total Tasks: 2 Dedicated Resources Per Task: PROCS: 1 MEM: 2048M
To have 2 tasks w/ each proc dedicated to have 2GB, you would need your PBS directive:
#PBS -l walltime=00:05:30,nodes=1:ppn=2,mem=4gb checkjob -v [jobid]: Total Tasks: 2 Dedicated Resources Per Task: PROCS: 1 MEM: 2048M
Running a job on a specific nodes
#!/bin/bash #PBS -l walltime=00:03:00,host=compute-1-1+compute-1-2+compute-1-3,nodes=3 #PBS -N testnodes #PBS -j oe #PBS -q default set -x cd "$PBS_O_WORKDIR" echo "Run my job." job.sh
Running an interactive job
- Be sure to have logged in w/ X11 forwarding enabled (if using ssh, use ssh -X or ssh -Y; if using putting, you need to be sure to check the X11 forwarding box)
- You do not have to specify a specific node as below
Running an interactive job on a specific node
To run on a specific node use the '-l' with the'host='option; to run Interactively, use the '-I' (capital I) option. Example below is requesting you get the node for 5 days:
#!/bin/bash #PBS -l host=compute-1-5, walltime=120:10:00 #PBS -I #PBS -N test #PBS -j oe #PBS -q default set -x
Running multiple copies of your job
- This method is recommended over using a for/while loop within a bash script. We have seen this "confuse the scheduler"
The following example creates two jobs at once.
#!/bin/bash #PBS -q default #PBS -l walltime=00:05:00,nodes=1 #PBS -t 1-2 #PBS -j oe #PBS -N intro #PBS -V set -x cd "$PBS_O_WORKDIR" echo "Run my job." jobscript.sh
The scheduler on astra is similar to that of the CAC v4 scheduler with a few distinct changes:
- There is no need to specify an account number in batch scripts.
- The command to submit jobs is qsub, not nsub.
- The v4 scheduler allows only one job per node.