BatchLinuxMultipleCopies
From Cornell CAC Documentation
(ppn shouldn't be there) |
(added some script tips) |
||
| Line 13: | Line 13: | ||
* task.sh: Run once for each task (8 per node), this runs the main program. | * task.sh: Run once for each task (8 per node), this runs the main program. | ||
* from_node.sh: Run once for each node, this copies files back. | * from_node.sh: Run once for each node, this copies files back. | ||
| - | You run "nsub batch.sh" to submit the job, and the scheduler later runs batch.sh on one of the nodes. Batch.sh, in turn, calls the other three files, to_node, task, and from_node. | + | You run "nsub batch.sh" to submit the job, and the scheduler later runs batch.sh on one of the nodes. Batch.sh, in turn, calls the other three files, to_node, task, and from_node. Here are a couple of friendly tips: |
| + | * If you want to copy-and-paste these scripts, first switch to the "view source" tab at the top of this page. | ||
| + | * Remember to set the execute bit on the scripts by typing "chmod +x *.sh" at the command line. | ||
batch.sh: | batch.sh: | ||
Latest revision as of 17:26, 15 July 2010
Batch Script to Run Multiple Copies
- Linux cluster
- Many copies of the same program with different arguments
Each separate computer in the cluster is called a node. There can be more than one processor on each node, and each processor usually has multiple processing cores. For the V4 cluster, that means there are 8 cores, so that you can run 8 tasks per node, or 8 copies of your program.
While tasks run and write output to files, the files should be stored on the local temporary directory on each node. That directory is /tmp. At the end of the job, copy files back to the home directory on the shared filesystem.
The batch script will setup the computing environment, run tasks, and tear it down:
- batch.sh: The user submits this script to the batch system. It runs the next three.
- to_node.sh: Run once for each node, this sets up the /tmp directory.
- task.sh: Run once for each task (8 per node), this runs the main program.
- from_node.sh: Run once for each node, this copies files back.
You run "nsub batch.sh" to submit the job, and the scheduler later runs batch.sh on one of the nodes. Batch.sh, in turn, calls the other three files, to_node, task, and from_node. Here are a couple of friendly tips:
- If you want to copy-and-paste these scripts, first switch to the "view source" tab at the top of this page.
- Remember to set the execute bit on the scripts by typing "chmod +x *.sh" at the command line.
batch.sh:
#!/bin/sh#PBS -l walltime=1:00:00,nodes=2#PBS -A dal16_0001#PBS -j oe#PBS -N insects#PBS -q v4dev# Turn on echo of shell commandsset -x
# Counts the number of cores on this host.# Assume this doesn't change across the cluster.CORESPERNODE=`grep processor /proc/cpuinfo | wc -l`
# Pull standard stuff from the environment variables# If running under batch, as defined by the presence of# a PBS_NODEFILE, then pull info from environment variables.if [ -n "$PBS_NODEFILE" ]
thenNODECNT=$(wc -l < "$PBS_NODEFILE")
TASKCNT=`expr $CORESPERNODE \* $NODECNT`
RUNDIR=$PBS_O_WORKDIR
# The job id is something like 613.scheduler.v4linux.# This deletes everything after the first dot.JOBNUMBER=${PBS_JOBID%%.*}
echo '============================'
echo $0
echo '============================'
else# These variables are used running an interactive debugging job# on v4dev.NODECNT=1
TASKCNT=4
RUNDIR=/home/gfs01/ajd27/dev/working
PBS_NODEFILE=$RUNDIR/nodefile
echo localhost>$PBS_NODEFILE
JOBNUMBER=01fi# Set up our jobEXT=$JOBNUMBER
cd $RUNDIR
cat $PBS_NODEFILE
if mpdboot -n $NODECNT -r /usr/bin/ssh -f $PBS_NODEFILE
thenmpiexec -ppn 1 -np $NODECNT $RUNDIR/to_node.sh $EXT $RUNDIR
mpiexec -ppn $CORESPERNODE -np $TASKCNT $RUNDIR/task.sh $EXT $RUNDIR
mpiexec -ppn 1 -np $NODECNT $RUNDIR/from_node.sh $EXT $RUNDIR
mpdallexit
fi
to_node.sh:
#!/bin/bashEXT=$1-${HOSTNAME:ar}
RUNDIR=$2
SCRATCH=/tmp/$USER
# -p tells mkdir not to worry if the directory already exists.# If it matters, you could delete everything in the directory before starting.mkdir -p $SCRATCH
cp $RUNDIR/*.R $SCRATCH/
When the task.sh script runs, MPI defines a variable, called PMI_RANK, which holds the zero-based index of this script among the tasks. For instance, if you started four tasks, then PMI_RANK would be 0, 1, 2, or 3 for each of the scripts when they run. We use this variable to paramaterize our program so that it does different work in each script and saves the results to a different output file. task.sh:
#!/bin/bash# Create a file extension unique to each hostname (if you want)# The %%.* turns v4linuxlogin1.cac.cornell.edu into v4linuxlogin1.EXT=$1-${HOSTNAME%%.*}
RUNDIR=$2
SCRATCH=/tmp/$USER
cd $SCRATCH
R --no-save --args ${PMI_RANK} < main.R > out${PMI_RANK}.txt
Copy results back to the shared drive. from_node.sh:
#!/bin/bashRUNDIR=$2
SCRATCH=/tmp/$USER
cd $SCRATCH
cp out* $RUNDIR
The R code to get the command-line argument is as follows.
takeOnlyArgumentsAfterDashArgs=TRUE
args=commandArgs(takeOnlyArgumentsAfterDashArgs)
if (length(args)<1) {
print("R --no-save --args arg1 arg2... < script > out.dat");
stop();
}
whichParameter = as.integer(args[1])
cat("which parameter", whichParameter, "\n");
This enables you to use the PMI_RANK, now accessible as the integer variable whichParameter, to decide what this particular R process should do.
Running Lines From a Script File
If it feels easier, you could make a script file, todo.txt:
myprogram -size 100 myprogram -size 200 myprogram -size 300 myotherprogram -size 100 myotherprogram -size 100
and then have the batch script run individual lines from this file. Change task.sh:
#!/bin/bash# Create a file extension unique to each hostname (if you want)# The %%.* turns v4linuxlogin1.cac.cornell.edu into v4linuxlogin1.EXT=$1-${HOSTNAME%%.*}
RUNDIR=$2
SCRATCH=/tmp/$USER
cd $SCRATCH
linenumber=`expr ${PMI_RANK} + 1`
eval `head -$linenumber todo.txt | tail -1`