Interactive IPython

From CAC Documentation wiki
Revision as of 20:04, 18 September 2015 by Ad876 (talk | contribs) (Created page with "{{LastRevised}} ==Interactive Cluster Computing with IPython== Goal: Use several nodes of a Linux cluster, running ROCKS with the Moab/Torque, to perform data analysis from a...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

(Last revised: 2015-09-18)

Interactive Cluster Computing with IPython

Goal: Use several nodes of a Linux cluster, running ROCKS with the Moab/Torque, to perform data analysis from a Python command-line.

IPython is an extension to use Python as a highly-programmable command-line environment. Its parallel computing extensions allow you to start a single controller and many engines. You can then connect the command-line environment to the controller and send commands to the engines in a nice streamlined fashion. The engines can't talk with each other, but they will reliably work in parallel to complete trivially-parallelizable tasks.

Why is this so cool? Using an example from IPython documentation, you can have a whole slew of computers execute Python functions in parallel fairly conveniently:

> from IPython.kernel import client
> mec=client.MultiEngineClient('mec.furl')
> mec.get_ids()
(0, 1, 2, 3)
> mec.activate()
> import numpy
> %px import numpy
<Results List>
[0] In [8]: import numpy
[1] In [7]: import numpy
[2] In [8]: import numpy
[3] In [7]: import numpy
> %px numpy.random.rand(2,2)
<Results List>
[0] In [9]: a = numpy.random.rand(2,2)
[1] In [8]: a = numpy.random.rand(2,2)
[2] In [9]: a = numpy.random.rand(2,2)
[3] In [8]: a = numpy.random.rand(2,2)
> %px print numpy.linalg.eigvals(a)
Executing command on Controller
Out[28]:
<Results List>
[0] In [10]: print numpy.linalg.eigvals(a)
[0] Out[10]: [ 1.28167017  0.14197338]

[1] In [9]: print numpy.linalg.eigvals(a)
[1] Out[9]: [-0.14093616  1.27877273]

[2] In [10]: print numpy.linalg.eigvals(a)
[2] Out[10]: [-0.37023573  1.06779409]

[3] In [9]: print numpy.linalg.eigvals(a)
[3] Out[9]: [ 0.83664764 -0.25602658]

The pieces that make this work are the controller, engines, and client.

  • ipcontroller - One of these listens to the client and sends work to the engines.
  • ipengine - These run on the cluster.
  • IPython.kernel.client - You create an instance of this to talk to the ipcontroller.

They communicate using SSL-encrypted TCP streams, but they negotiate the first connection using files called furls. ipcontroller creates the furls, and ipengines and the client read them to find the controller.

Installation of IPython

Install it in your home directory on linuxlogin. This directory is shared among cluster nodes, so installing it for linuxlogin makes it available to all of the nodes. Installing the Python package setuptools will provide easy_install, which makes setup a bit easier. After it is installed, ensure you set PYTHONPATH in your .bashrc (or .cshrc).

Once it is installed, allow IPython to create a ~/.ipython directory by running ipcluster once. This will set up the proper directories for the following steps.

Batch Scripts to Start the Engines

The IPython engines will run on the cluster. The following batch script starts engines on each core of two development nodes. Each engine discovers the main controller by looking at a furl file, which for job id 3337 is named mec3337.furl.

#!/bin/sh
#PBS -l walltime=30:00,nodes=2:ppn=1
#PBS -A user_0001
#PBS -j oe
#PBS -N ipy
#PBS -q v4dev
#PBS -S /bin/sh

# Turn on echo of shell commands
set -x
# Ensure the expected version of python is available
which python
python -V

# If the path and pythonpath aren't set, set them.
PATH=/home/gfs01/ajd27/bin:$PATH
PYTHONPATH=/home/gfs01/ajd27/lib/python2.5/site-packages
export PYTHONPATH
export PATH

CORECNT=`grep processor /proc/cpuinfo | wc -l`

# Pull standard stuff from the environment variables
if [ -n "$PBS_NODEFILE" ]
then
  NODECNT=$(wc -l < "$PBS_NODEFILE")
  TASKCNT=`expr $CORECNT '*' $NODECNT`
  RUNDIR=$PBS_O_WORKDIR
  # The job id is something like 613.scheduler.v4linux.
  # This deletes everything after the first dot.
  JOBNUMBER=${PBS_JOBID%%.*}
  echo '============================'
  echo $0
  echo '============================'
else
  # For interactive testing, create your own node file with "localhost"
  NODECNT=1
  TASKCNT=$CORECNT
  RUNDIR=/home/gfs01/ajd27/dev/TACC/ipython
  PBS_NODEFILE=$RUNDIR/nodefile
  echo localhost>$PBS_NODEFILE
  JOBNUMBER=01
fi

# Set up our job
EXT=$JOBNUMBER
# IPython uses these so distributed clients can find each other.
ENGFURL=$RUNDIR/engine${EXT}.furl
MECFURL=$RUNDIR/mec${EXT}.furl
TASKFURL=$RUNDIR/task${EXT}.furl

cd $RUNDIR
~/bin/ipcontroller --engine-furl-file=$ENGFURL --multiengine-furl-file=$MECFURL 
--task-furl-file=$TASKFURL &
# Give the controller some time to get started before the engines ask it to respond.
sleep 5
# The ipcontroller dissociates from the terminal, so look for it among running processes.
if [ `ps augx | grep ipcontroller | wc -l` -le 1 ]
then
  echo "Controller did not start."
  exit
fi

cat $PBS_NODEFILE
if mpdboot -n $NODECNT -r /usr/bin/ssh -f $PBS_NODEFILE
then
  mpiexec -n $TASKCNT ~/bin/ipengine --furl-file=$ENGFURL
  mpdallexit
fi

rm $ENGFURL $MECFURL $TASKFURL

Run IPython Client

Now from the head node, start ipython, following instructions for its multiengine interface.

In [1]: from IPython.kernel import client
In [2]: mec = client.MultiEngineClient('mec3337.furl')

You are now ready to run. The clients will stay alive until the job runs out of time or until you execute:

In []: mec.kill(controller=True)

at which point each ipengine will exit, leading the script to exit. You have to add "controller=True" to ensure you kill the controller, too. No more time will be charged to the job.

Using the Cluster Interactively from Another Machine

There is a way to start IPython on a computer elsewhere on campus, even from home, and have it use IPython to run commands on the cluster.

The batch script above starts the controller and engines. If you add the --client-port argument, you can specify on which TCP port the ipcontroller will listen for the client to connect:

 $ ipcontroller --client-port=10000 --engine-furl-file=eng.furl --multiengine-furl-file=mec.furl

It isn't possible to connect directly to port 10000 because of security firewalls. If you set up an SSH tunnel to the cluster, as described in SecureShell, then you can ask IPython on your local machine to connect to a local port, but talk to linuxlogin.cac.cornell.edu:10000 on the server.

  • Set up the tunneling ssh. In Linux, it would be done with:
$ ssh -L 10000:linuxlogin.cac.cornell.edu:10000 username@linuxlogin.cac.cornell.edu
  • Copy the mec.furl to your local machine.
$ scp username@linuxlogin.cac.cornell.edu:dev/ipython/mec.furl .
  • If the local tunnel port is the same as the remote port, you can just connect as you would have on linuxlogin. From IPython, type:
 from IPython.kernel import client
 mec = client.MultiEngineClient('mec.furl')
  • If the local tunnel port is not the same, as the remote because you typed something like:
$ ssh -L 10001:linuxlogin.cac.cornell.edu:10000 username@linuxlogin.cac.cornell.edu

then edit the mec.furl. A typical mec.furl looks like:

 pb://zzeaecabcdefg6yjaxo@127.0.0.1:10000,128.84.3.50:10000/ayt6ijb3tsblheav4bsu5e2pd6x77tl6

Do you see the two ip addresses? 127.0.0.1 and 128.84.3.50. These are the loopback and internet ip addresses of linuxlogin. What we need to do is tell IPython to connect, on our client machine, to the localhost, but to do it to the port we specified in our tunnel.

 pb://zzeaecabcdefg6yjaxo@127.0.0.1:10001,128.84.3.50:10000/ayt6ijb3tsblheav4bsu5e2pd6x77tl6

Now start IPython, using this mec.furl, and everything will work.