Difference between revisions of "ASTRA Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 13: Line 13:
 
:*** the cluster scheduler maui combined with torque as the resource manager.
 
:*** the cluster scheduler maui combined with torque as the resource manager.
 
:*** Rocks cluster server deployment software & database
 
:*** Rocks cluster server deployment software & database
:*** /home (11TB) directory server (nfs exported to all cluster nodes)  
+
:*** /home (11TB) directory server (nfs exported to all cluster nodes)
 +
:*** /home/fs01/xxxx/backmeup folder is backed up nightly, all other files and folders are NOT backed up
 
:*** /tmp (39GB)
 
:*** /tmp (39GB)
  

Revision as of 12:56, 24 November 2015

ASTRA General Information

  • ASTRA is a private cluster with restricted access to the na346_0001 group.
  • Rocks 6.0 with CentOS 6.2
  • Cluster Status: Ganglia.
  • --> Submit HELP requests: help OR by sending email to help
  • ASTRA has one head node (astra.cac.cornell.edu) and 40 compute nodes (compute-1-[1-40]).
    • Each compute node:
      • 32GB of RAM, 883GB /tmp
      • 12 core, Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
      • 12/5/12: hyperthreading was turned off on all compute nodes.
    • The headnode, astra, contains:
      • the cluster scheduler maui combined with torque as the resource manager.
      • Rocks cluster server deployment software & database
      • /home (11TB) directory server (nfs exported to all cluster nodes)
      • /home/fs01/xxxx/backmeup folder is backed up nightly, all other files and folders are NOT backed up
      • /tmp (39GB)


Getting Started on the astra cluster

How to Login, create my first job

  • Log into astra.cac.cornell.edu via ssh
  • Password change at FIRST LOGIN:
    • You will be prompted to change your password upon initial login.
    • You will also ask you for an ssh pass phrase. You can leave this blank; just hit the Enter key.
Example changing from an old password of '0ldpassw0rd!!' and a new password of 'newpassw0rd!!' : (Note that you are prompted 2x's for the old password)
$ ssh your_username@astra.cac.cornell.edu
Password: (ENTER 0ldpassw0rd!!) 
WARNING: Your password has expired. 
You must change your password now and login again! 
Changing password for user your_username. 
Kerberos 5 Password: (ENTER 0ldpassw0rd!!) 
New UNIX password: (ENTER newpassw0rd!!) 
Retype new UNIX password: (ENTER newpassw0rd!!) 
passwd: all authentication tokens updated successfully. 
Connection to astra closed.

If you get a token error, it most likely means that the password is not complex enough. Your password must contain at least three of the following four elements (1)uppercase letters (2)lowercase letters (3)special characters (4)digits and be a minimum 8-characters.

  • home directory: /home/fs01/userid (referenced by either: ~ or $HOME)
  • Familiarize yourself with your pre-set environment variables; type: env
  • Please run through the examples to help familiarize yourself to the ASTRA cluster
  • Review the Maui commands!

(Submit a job: qsub, Check on a job: checkjob, View all jobs: showq, Cancel a job: canceljob)

Astra Documentation Main Page

Software

Installed Software

The module system is implemented for some software. To list an environment you can put yourself in:

module avail 

EXAMPLE: To be sure you are using the environment setup for python2.7.15, you would type:

 * module avail
 * module load python2.7.15
 - when done, either logout and log back in or type:
 * module unload python2.7.15

(sortable table)
Package and Version Location Module available astra Computes Notes
Python 2.7 /usr/local/bin/python2.7 module load python2.7 yes yes /opt/epd-7.1-2-rh5-x86_64
Python 2.7.15 /opt/python2.7.15/bin/python2.7 module load python2.7.15 yes yes
Intel Compilers 11.1 /opt/intel no yes yes
Gromacs 5.1.5 /opt/gromacs module load gromacs/5.1.5 yes no
Gaussian C.01 /opt/g09 no yes yes
Gauss View 508 /opt/gv no yes yes
amber12 /opt/amber12 no yes yes
Gamess /opt/gamess no yes yes
gnuplot /usr/share/gnuplot no yes yes
orca 3.0.2 /opt/orca_3_0_2_linux_x86-64/ no yes yes /usr/local/bin/orca
mpich 3.1-5 /usr/lib64/mpich module load mpich3-x86_64 yes yes
julia 1.1.1 /opt/julia-1.1.1 module load julia-1.1.1 yes yes
  • It is usually possible to install software in your home directory.
  • List installed software via rpms: 'rpm -qa'. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]
  • Send email to: cac-help@cornell.edu to request installation or update of any software. You're not limited to what appears in the list above, possibly pending the permission of the cluster PI.

Astra Documentation Main Page

GROMACS and Gaussian

These software packages can be found in /opt, but to run them, you should copy (cp -r) the relevant main directory from /opt to your home directory. You can then modify any auxiliary files as you see fit.

To get the best performance, arrange for your batch job to do all its I/O to the local /tmp of a compute node. Thus, have your batch script copy all the input files to $TMPDIR (or to another directory in /tmp that is created by your script) at the beginning of your job. Then, at the end of your job, copy the output files back to your home directory.

For GROMACS:

  1. A set of test files can be found in /opt/gromacs/share/gromacs/tutor/water.
  2. Remember to put this line in your batch script (or in .profile) prior to running GROMACS:
export LD_LIBRARY_PATH=~/gromacs/lib:$LD_LIBRARY_PATH

For Gaussian:

  1. Your private copy cannot be accessible to others (by the terms of the license), so fix your permissions like this to avoid an error: chmod -R o-rwx ~/g09-C.01
  2. In your batch job, change the location of Gaussian to point to your private copy:
export GAUSS_EXEDIR=/home/fs01/''myuserid''/g09-C.01

How to run jobs

After you have familiarized yourself with the 'Getting Started':

Maui Scheduler and Job submission/monitoring commands

Jobs are scheduled by the Maui scheduler with the Torque resource manager. We suggest you use a job submission batch file utilizing PBS Directives ('Options' section).

Common Maui Commands

(If you have any experience with PBS/Torque or SGE, Maui Commands may be recognizable. Most used:

qsub - Job submission (jobid will be displayed for the job submitted)

  • $ qsub jobscript.sh

showq - Display queue information.

  • $ showq (dump everything)
  • $ showq -r (show running jobs)
  • $ showq -u foo42 (shows foo42's jobs)

checkjob - Display job information. (You can only checkjob your own jobs.)

  • $ checkjob -A jobid (get dense key-value pair information on job 42)
  • $ checkjob -v jobid (get verbose information on job 42)

canceljob - Cancel Job. (You can only cancel your own jobs.)

  • $ canceljob jobid

Setting Up your Job Submission Batch File

Commands can be run on the command line with the qsub command. However, we suggest running your jobs from a batch script. PBS Directives are command line arguments inserted at the top of the batch script, each directive prepended with '#PBS' (no spaces). Reference PBS Directives

The following script example is requesting: 1 node, max time for job to be 5 minutes,30 seconds; output and error to be joined into the same file, descript name of job 'defaulttest' and use the 'default' queue:

#!/bin/bash
#PBS -l walltime=00:05:30,nodes=1
#PBS -j oe
#PBS -N defaulttest
#PBS -q default

# Turn on echo of shell commands
set -x
# jobs normally start in the HOME directory; cd to where you submitted.
cd $PBS_O_WORKDIR
# copy the binary and data files to a local directory on the node job is executing on
cp $HOME/binaryfile $TMPDIR/binaryfile
cp $HOME/values $TMPDIR/values
cd $TMPDIR
# run your executable from the local disk on the node the job was placed on
./binaryfile >&binary.stdout
# Copy output files to your output folder	
cp -f $TMPDIR/binary.stdout $HOME/outputdir

A job with heavy I/O (use /tmp as noted in the above example)

  • Use /tmp to avoid heavy I/O over NFS to your home directory!
  • Ignoring this message could bring down the ASTRA CLUSTER HEAD NODE!

Request job to run on 5 nodes, 4 processes per node, 20 total processes

#PBS -l walltime=00:05:30,nodes=5:ppn=4

Additional jobs submitted by you or others can run on the unused slots (up to 12) on the same nodes. This may not be good if your job uses lots of memory.

Request job to run exclusively on 2 nodes, 3 processes per node, 6 total processes

#!/bin/bash
#PBS -l nodes=2:ppn=12
#PBS -l walltime=00:02:00
#PBS -q test
#PBS -j oe
#PBS -N 2nodes6tasks

cd ${PBS_O_WORKDIR}

# Construct a copy of the hostfile with only TPN entries per node.
# MPI can use this to run TPN tasks on each node.
TPN=3
uniq "$PBS_NODEFILE" | awk -v TPN="$TPN" '{for(i=0;i<TPN;i+=1) print}' > nodefile."$TPN"way

mpiexec --hostfile nodefile."$TPN"way --tag-output hostname

To have exclusive access to nodes, your batch script must request all 12 slots per node. (ASTRA nodes have 12 cores each.) But let's say you want your MPI job to launch only a few tasks on each node, perhaps because each task uses a lot of memory. In that case, you cannot use the default $PBS_NODEFILE, as it lists each node 12 times. The above script creates a replacement hostfile listing each node $TPN times instead.

Request memory resources

To dedicate 2GB of memory for your job, add the 'mem=xx' to the '-l' option you should already have in your batch script file:

#PBS -l walltime=00:05:30,nodes=1,mem=2gb


Use 'checkjob -v [jobid]' to display your resources:
Total Tasks: 2
Dedicated Resources Per Task: PROCS: 1  MEM: 2048M

To have 2 tasks w/ each proc dedicated to have 2GB, you would need your PBS directive:

#PBS -l walltime=00:05:30,nodes=1:ppn=2,mem=4gb

checkjob -v [jobid]:
Total Tasks: 2
Dedicated Resources Per Task: PROCS: 1  MEM: 2048M

Running a job on a specific nodes

#!/bin/bash

#PBS -l walltime=00:03:00,host=compute-1-1+compute-1-2+compute-1-3,nodes=3
#PBS -N testnodes
#PBS -j oe
#PBS -q default

set -x
cd "$PBS_O_WORKDIR"
echo "Run my job."
job.sh

Running an interactive job

  • Be sure to have logged in w/ X11 forwarding enabled (if using ssh, use ssh -X or ssh -Y; if using putting, you need to be sure to check the X11 forwarding box)
  • You do not have to specify a specific node as below

Running an interactive job on a specific node

To run on a specific node use the '-l' with the'host='option; to run Interactively, use the '-I' (capital I) option. Example below is requesting you get the node for 5 days:

#!/bin/bash
#PBS -l host=compute-1-5, walltime=120:10:00       
#PBS -I
#PBS -N test
#PBS -j oe
#PBS -q default

set -x

Running multiple copies of your job

In order to run separate instances of the same program, you can use the scheduler's task array feature, through the "-t" option. array_request

  • On ASTRA, the "nodes" parameter in a batch job's resource list refers to cores, not nodes. Thus, separate tasks specified by -t will pile onto a single batch machine, one-per-core, then move on to the next machine, etc. (This may or may not be what you want.)
  • WARNING: We have discovered that running a job with more than 3500 tasks will bring down the Maui open source scheduler.
  • BE NICE TO YOUR PEERS who are also trying to run on this cluster. It has an "honor system" for the users. An excellent example of niceness is to submit your array request with "an optional slot limit" to limit the amount of jobs that can run concurrently in the job array. The default value is unlimited. The slot limit must be the last thing specified in the array_request and is delimited from the array by a percent sign (%).
  • The example below requests 30 tasks, yet it runs only 12 at a time (enough to fill just one batch machine). You will see the other 18 go into the 'IDLE JOBS' category when using 'showq'. These jobs will start as soon as one or more of the first 12 have completed, assuming CPUs are available.
#!/bin/bash
#PBS -l nodes=1,walltime=10:00   (note: this is PBS -l (small case L))
#PBS -t 1-30%12
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"
echo Run my job.
  • Otherwise, if you do not have a need to limit the number of jobs, you simply ask for 30 tasks by putting the #PBS -t directive in your script, or by using the command line option. The numeric argument to -t is an integer id or a range of integers. The range does not have to start with 1, and multiple ranges and individual ids can be specified.
  • #PBS -t 30-60
  • qsub -t 1-20,25,50-60 job_script

For more ideas on how to make use of job arrays, see Multiple Batch Tasks.

This method is recommended over using a for/while loop within a bash script. We have seen this "confuse the scheduler".

Further examples:

The scheduler on astra is similar to that of the CAC v4 scheduler with a few distinct changes:

  • There is no need to specify an account number in batch scripts.
  • The command to submit jobs is qsub, not nsub.
  • The v4 scheduler allows only one job per node.

Examples from the CAC v4 scheduler


Astra Documentation Main Page

Queues

At this time astra only has 2 queues. Each node has 32GB ram and 883GB /tmp:

  • default (also the default queue if no queue specified)
Nodes: compute-1-[3-40]
Limits: walltime limit: 336 hours
  • test
Nodes: compute-1-[1-2]
Limits: walltime limit: 12 hours

Cluster Tips

  • Monitor with ganglia Ganglia
  • For individual job/node monitoring use ganglia or type: top [press '1' to view all cpu's]
  • Use the local /tmp rather than the nfs-mounted home directory!
  • There is no /tmp cleanup policy outside the /tmp/$PBS_JOBID deletion at job end.