Batch Jobs for Apps Started by Graphical Interfaces

From CAC Documentation wiki
Jump to navigation Jump to search

Let's say you have a program you want to run on the batch nodes but you have to run a graphical interface to get it started. For instance, gsAssembler is a Java program with dialogs that configure and then start the Newbler program, which can then run for a long time.

Note: gsAssembler is not currently able to display a window from a compute node. The technique below is how you would do it, though.

In order to make this work, you need to get hold of a compute node, then login to it, start your program, and somehow make the job stop when your program is done. Here is a recipe to do just that.

1. Make sure you have X-Windows running on your machine and that you have enabled X11 forwarding on your ssh connection to the cluster. Don't know what that is? Then look at Connect to Linux.

2. Submit a batch job. It shouldn't be an interactive batch job, because those automatically end as soon as you logout of the compute node. The code to execute on the node will do nothing but wait for you to login and then watch to see that your program has run.

Create this file, called wait.sh. You can set it to use any queue. Correct the account to be the project ID you use.:

#!/bin/bash
# wait.sh
#PBS -l walltime=00:30:00,nodes=1
#PBS -A dal16_0003
#PBS -j oe
#PBS -o ${PBS_O_WORKDIR}/${PBS_JOBID%%.*}.out
#PBS -N batchtest
#PBS -q v4dev

# Turn on echo of shell commands
set -x

# This must be set to the name of the program that will run.
# When this program stops, the job will end.
PROG=newbler

echo Wait until user logs in `date`
while who -q|head -1|grep -v "\b${USER}\b"
do
  sleep 10
done

echo Wait until the user logs off `date`
while who -q|head -1|grep "\b${USER}\b"
do
  sleep 30
done

echo Wait until the ${PROG} finishes `date`
while ps -u ${USER} -o comm|grep $PROG
do
  sleep 30
done

echo ${PROG} finished `date`

Then submit the job with "nsub wait.sh". It will return a job id, such as 1020252.

3. When the job is running, login to the node on which it is running.

Find whether the job is running with "showq|grep $USER" or "checkjob 1020252|grep State". If it doesn't start immediately, maybe take a look at showqlease to see how many jobs are waiting in the queue.

When the job is running, use checkjob to find what node it is using, and then ssh to that node, being sure to forward X11 connections using the -Y switch

$ checkjob 1020252|grep v4linux
[compute-3-27.v4linux:1]
$ ssh -Y compute-3-27

This will take you to the compute node.

4. Start the GUI and configure the job. In this example, run "gsAssembler&" to start the job. This program will start the long-running program, called newbler. You can now quit gsAssembler and logout from the compute node. The newbler program will continue to run.

5. The shell script will wait until the newbler program is no longer running before releasing the node. If you look above at the script, the PROG variable is set to newbler. The script works by looking at the list of running processes to see whether the word "newbler" is in that list.