Using IPython to Farm Work

From CAC Documentation wiki
Jump to navigation Jump to search

Running an IPython job

As an example of a more involved script, this starts a cluster of IPython engines and then executes a script using IPython. For testing, this script checks whether there are any PBS variables defined. If not, it defines some substitutes in order to run the script interactively. Note that this script tests whether mpdboot and mpiexec succeeded. If these fail, it is possible for the script to start but fail to finish, which wastes compute time. You would need to install IPython under your account to run farm.py. Media:Farm_example.zip

 #!/bin/sh
 ## Although this says ppn=1 we can still specify a large task
 ## count to mpiexec below.
 #PBS -l walltime=1:00:00,nodes=2:ppn=1
 #PBS -A yourAcct
 #PBS -j oe
 #PBS -N ipjob
 #PBS -q v4dev
 
 # Turn on echo of shell commands
 set -x
 
 # Pull standard stuff from the environment variables
 if [ -n "$PBS_NODEFILE" ]
 then
   NODECNT=$(wc -l < "$PBS_NODEFILE")
   TASKCNT=`expr 8 '*' $NODECNT`
   RUNDIR=$PBS_O_WORKDIR
   # The job id is something like 613.scheduler.v4linux.
   # This deletes everything after the first dot.
   JOBNUMBER=${PBS_JOBID%%.*}
   # Print this script to the output.
   cat $0
 else
   # For interactive testing, create your own node file with "localhost"
   NODECNT=1
   TASKCNT=4
   RUNDIR=/home/gfs01/ajd27/dev/ipy
   PBS_NODEFILE=$RUNDIR/nodefile
   echo localhost>$PBS_NODEFILE
   JOBNUMBER=01
 fi
 
 # Set up our job
 EXT=$JOBNUMBER
 # IPython uses these so distributed clients can find each other.
 ENGFURL=$RUNDIR/engine${EXT}.furl
 MECFURL=$RUNDIR/mec${EXT}.furl
 TASKFURL=$RUNDIR/task${EXT}.furl
 
 cd $RUNDIR
 
 ~/bin/ipcontroller --engine-furl-file=$ENGFURL --multiengine-furl-file=$MECFURL --task-furl-file=$TASKFURL &
 sleep 5
 if [ `ps augx | grep ipcontroller | wc -l` -le 1 ]
 then
   echo "Controller did not start."
   exit
 fi
 
 cat $PBS_NODEFILE
 if mpdboot -n $NODECNT -r /usr/bin/ssh -f $PBS_NODEFILE
 then
   mpiexec -n $TASKCNT ~/bin/ipengine --furl-file=$ENGFURL &
   sleep 20
   if [ `ps augx | grep ipengine | wc -l` -le 1 ]
   then
     echo 'mpiexec failed.'
 	mpdallexit
     exit
   fi
 
    python2.5 runtest.py runs.yaml $MECFURL $TASKFURL
    mpdallexit
 fi
 
 rm $ENGFURL $MECFURL $TASKFURL