Farm-out-work

From CAC Documentation wiki
Jump to navigation Jump to search

Farm-out-work

Introduction

This is a technique to run many programs as tasks on the cluster. It is a simple MPI code written at CAC. Submit a ticket to help, and we'll give you the source code, and/or a compiled executable. You specify a list of tasks to run, such as:

# These are the command lines to run. Comments begin with a hash.
$HOME/calc 2.1
$HOME/calc 2.2
$HOME/calc 2.3
$HOME/calc 2.4
$HOME/calc 2.5

The list can be as long as you like. You choose, in your batch script, how many nodes of the cluster to allocate to your job:

#PBS -l walltime=12:00:00,nodes=4

The farm program reads the list of tasks and executes them on the nodes. For four nodes, it will start 4*8 tasks, one per core, as you specify when you start farm using mpiexec. When a task finishes, farm will run another from the list of tasks until all are complete.

Usage

./farm-out-work -v -h -t tasks.txt
-v or --verbose prints messages about progress. Use more than once for more messages.
-h              prints the help message
-t tasks.txt    specifies a file from which to read command lines. Blank lines and those
                starting with hash # are ignored.

To build farm, execute "make" in the same directory as the source. It uses mpicc to compile the executable.

Simple Example

You write a file called tasks.txt as shown above. The following script will run the farm program using the number of nodes specified on the first line of the file. You submit this to the scheduler.

#batch.sh Submit this with 'nsub batch.sh'
#PBS -l nodes=2,walltime=4:00:00
#your account number
#PBS -A account_0001
#Join stdout and stderr
#PBS -j oe
# This shortens the name of the output file to just the job id.
#PBS -o ${PBS_O_WORKDIR}/${PBS_JOBID%%.*}.out
#Name the job something useful
#PBS -N Clans1Linux
#PBS -q v4

set -x

FARM=/usr/local/bin/farm-out-work
TASKFILE=tasks.txt

NODECNT=$(wc -l < "$PBS_NODEFILE")
TASKCNT=`expr 8 '*' $NODECNT`
if mpdboot -n $NODECNT --verbose -r /usr/bin/ssh -f $PBS_NODEFILE
then
    mpiexec -ppn 8 -np $TASKCNT $FARM -v -t $TASKFILE
    mpdallexit
fi

What should be in a task file? Each line of the file will likely call another shell script responsible for copying data to the node and copying results back. Arguments to the shell script specify which piece of data to process.

# task.sh, The farm program runs this once for each task.
set -x
CASE=$1
JOBINDEX=$2
echo rank $PMI_RANK ${CASE} $JOBINDEX

# The scheduler automatically creates a $TMPDIR on the local drive of each node,
# but farm will run 8 separate processes, one for each core of the node, so each
# one likely needs its own subdirectory.
TEMPDIR=$TMPDIR/${JOBINDEX}
mkdir -p $TEMPDIR

# Your data files and directories will differ
OUTFILE=${CASE}.txt

cp ~/data/${CASE}* $TEMPDIR/
cp ~/bin/* $TEMPDIR/

cd $TEMPDIR

# Printing the data to OUTFILE with only one > deletes any previous OUTFILE while >> appends.
date
date > $OUTFILE
./${EXECBASE} ${CASE} >> $OUTFILE
date >> $OUTFILE
date

cp ${CASE}.* ~/results

# Get out of the working directory before you delete it.
cd
rm -rf "$TEMPDIR"

Fancy Example

This next example was written by a batch-script-happy consultant. A single batch script

  • Reads a data directory for tasks, then writes the tasks.txt file.
  • Writes the script that is listed in the tasks file.

It is also complicated because there is code to make it easy to run in three different environments:

  • In batch on the standard queue of v4
  • On the development nodes where it can be run interactively and skips running the real code.
  • On the login node where it skips MPI submission and skips running the real code.

What follows is the more thorough batch script.

# batch.sh - Submit this with 'nsub batch.sh'
#PBS -l nodes=1,walltime=4:00:00
#your account number
#PBS -A rgh1_0001
#Join stdout and stderr
#PBS -j oe
#PBS -o cl1_${PBS_JOBID%%.*}.out
#Name the job something useful
#PBS -N Clans1Linux
#PBS -q v4

set -x

# Define directories for executable.
EXECBASE=b_clg.sh

BASE=${PBS_O_WORKDIR}
BIN=${BASE}/cl1_bin
FARM=/usr/local/bin/farm-out-work
DATA=${BASE}/cl1_job
RESULTDIR=${BASE}/cl1_results

# Set MPI information depending on where executed.
if [ -n "$PBS_NODEFILE" ]
then
  # Running in batch
  NODECNT=$(wc -l < "$PBS_NODEFILE")
  TASKCNT=`expr 8 '*' $NODECNT`
  RUNDIR=$PBS_O_WORKDIR
  JOBNUMBER=${PBS_JOBID%%.*}
elif [ "$HOST" == "v4linuxlogin1.cac.cornell.edu" ]
then
  # For quick testing on login node.
  RUNDIR=$PWD
  JOBNUMBER="01"
else
  # For interactive testing on development nodes
  NODECNT=1
  TASKCNT=4
  RUNDIR=$PWD
  PBS_NODEFILE=$RUNDIR/nodefile
  echo localhost>$PBS_NODEFILE
  JOBNUMBER=01
fi

TASKFILE=$RUNDIR/tasks${JOBNUMBER}.txt
TASKBATCH=$RUNDIR/task${JOBNUMBER}.sh
rm -f $TASKFILE
rm -f $TASKBATCH

# Make the list of tasks
# Use full path to the task.
# This loops through input data, writing a lines to the tasks file.
# The taskind is just a counter so that each line has a unique index.
cd ${DATA}
taskind=0
for geofile in *.geo
do
  echo $TASKBATCH ${geofile%%.geo} ${taskind}>>$TASKFILE
    ((taskind+=1))
done

echo Wrote tasks.txt.

# Write the batch file for each task.
# This shell script writes a task.sh shell script.
# We want the created shell script to use some variables, but bash
# would normally substitute values for them, so we escape the 
# dollar sign with a backslash on those we want to keep.
# What is below would be written to look like the example above.
cat>${TASKBATCH}<<EOF
# task.sh - This is generated by batch.sh and run once for each task.
set -x
CASE=\$1
JOBINDEX=\$2
echo rank \$PMI_RANK \${CASE} \$JOBINDEX

if [ -n "\$TMPDIR" ]
then
  TEMPDIR=\$TMPDIR/${JOBNUMBER}_\${JOBINDEX}
else
  # When testing, use /tmp to create a temporary directory.
  TEMPDIR=/tmp/${JOBNUMBER}_\${JOBINDEX}
fi

if [ -d "\$TEMPDIR" ]
then
  echo Directory \$TEMPDIR already exists
else
  mkdir -p \$TEMPDIR
fi

OUTFILE=\${CASE}.txt

cp $DATA/\${CASE}* \$TEMPDIR/
cp $BIN/* \$TEMPDIR/

cd \$TEMPDIR

date
date > \$OUTFILE
if [ "$PBS_O_QUEUE" == "v4" ]
then
  ./${EXECBASE} \${CASE} >> \$OUTFILE
else
  # When running on v4dev or not running in batch, just echo hostname.
  hostname >> \$OUTFILE
  hostname >> \${CASE}.mtl
fi
date >> \$OUTFILE
date

cp \${CASE}.geo $RESULTDIR
cp \${CASE}.em1 $RESULTDIR
cp \${CASE}.dtr $RESULTDIR
cp \${CASE}.mtl $RESULTDIR

cd
rm -rf "\$TEMPDIR"
EOF
chmod a+x ${TASKBATCH}

echo Wrote task.sh.

# Don't mpdboot on the login node
if [ -n "$PBS_NODEFILE" ]
then
  if mpdboot -n $NODECNT --verbose -r /usr/bin/ssh -f $PBS_NODEFILE
  then
      mpiexec -ppn 8 -np $TASKCNT $FARM -v -t $TASKFILE
      mpdallexit
  fi
else
  echo No nodefile so no mpi to run.
fi

rm -f ${TASKFILE}
rm -f ${TASKBATCH}