Difference between revisions of "TARDIS Cluster"
(→Queues) |
|||
Line 49: | Line 49: | ||
:*c6: compute-6-[1-5].tardis | :*c6: compute-6-[1-5].tardis | ||
+ | |||
+ | :*c7: compute-7-[1-16].tardis | ||
Line 93: | Line 95: | ||
/tmp: 242GB | /tmp: 242GB | ||
− | Bear in mind that the login node | + | c7: (for members of group nab83_0001 only) |
− | + | Number of nodes: 16 servers (20 cores per server, 320 total cores) | |
+ | Node names: compute-7-[1-16].tardis | ||
+ | HW: Intel Xeon E52660V3, 2.6GHz | ||
+ | Memory: 64GB ram/server | ||
+ | /tmp: 242GB | ||
+ | |||
+ | Bear in mind that the login node may have different cpus that some of the older compute nodes (c1, c2). Therefore, '''programs compiled on the login node may not run on the older hardware in queues c1-c2''', unless you supply the correct compiler options for those architectures. Options affecting the Intel SSE level can be especiallysensitive. | ||
== Module System == | == Module System == |
Revision as of 11:12, 5 January 2016
TARDIS General Information
- Tardis is a private cluster with restricted access to Craig Fennie's cjf76_0001 group.
- Tardis has one head node and 85 compute nodes (1084 cores in total - compute node details can be found below under the 'Queues' section)
- Head node: tardis.cac.cornell.edu (access via ssh only) serves as:
- the cluster scheduler
- Rocks Cluster server for compute node deployment
- Compute nodes: (see below for node names and further information by the queue)
- Head node: tardis.cac.cornell.edu (access via ssh only) serves as:
Getting Started
login via ssh to tardis.cac.cornell.edu:
ssh USERID@tardis.cac.cornell.edu
General information about running jobs and logging in can be found in the Getting started documentation
Reminder this is information about the v4 cluster, not the tardis; the commands, syntax for batch jobs would be similar. (use qsub, not nsub)
showq displays jobs running
Once you have a job batch file created, you use 'qsub jobname' (not nsub as written in the v4 documentation).
Use: checkjob/canceljob to check on or cancel your job.
Scheduler
Maui 3.3.1/Torque 3.0.5; no project ID required.
Queues
qstat -q shows Tardis has 6 separate queues. Descriptions of the nodes in each queue follow.
- c1: compute-1-[1-16].tardis
- c2: compute-2-[1-16].tardis
- c3: compute-3-[1-16].tardis
- c4: compute-4-[1-16].tardis
- c5: compute-5-[1-16].tardis
- c6: compute-6-[1-5].tardis
- c7: compute-7-[1-16].tardis
c1: (original queue, slowest processors in the cluster).
Number of nodes: 16 servers (8 cores per server, 128 total cores) Node names: compute-1-[1-16].tardis HW: Intel Xeon E5430 2.66GHz Memory: 16GB ram/server /tmp: 77GB
c2:
Number of nodes: 16 servers (8 cores per server, 128 total cores) Node names: compute-2-[1-16].tardis HW: Intel Xeon E5530 2.40GHz Memory: 16GB ram/server /tmp: 113GB
c3: (fastest processors)
Number of nodes: 16 servers (20 cores per server, 320 total cores) Node names: compute-3-[1-2].tardis HW: Intel Xeon E5-2670v2 2.50GHz Memory: 48GB ram/server /tmp: 242GB
c4:
Number of nodes: 16 servers (12 cores per server, 192 total cores) Node names: compute-4-[1-16].tardis HW: Intel Xeon E5-2640 2.50GHz Memory: 32GB ram/server /tmp: 242GB
c5:
Number of nodes: 16 servers (16 cores per server, 256 total cores) Node names: compute-5-[1-16].tardis HW: Intel Xeon E5-2680 2.70GHz Memory: 32GB ram/server /tmp: 242GB
c6: (for collaborators)
Number of nodes: 5 servers (12 cores per server, 60 total cores) Node names: compute-6-[1-5].tardis HW: Intel Xeon E5-2643v2 3.50GHz Memory: 32GB ram/server /tmp: 242GB
c7: (for members of group nab83_0001 only)
Number of nodes: 16 servers (20 cores per server, 320 total cores) Node names: compute-7-[1-16].tardis HW: Intel Xeon E52660V3, 2.6GHz Memory: 64GB ram/server /tmp: 242GB
Bear in mind that the login node may have different cpus that some of the older compute nodes (c1, c2). Therefore, programs compiled on the login node may not run on the older hardware in queues c1-c2, unless you supply the correct compiler options for those architectures. Options affecting the Intel SSE level can be especiallysensitive.
Module System
The "module load" command helps you set up the software environment (e.g., $PATH) correctly for certain applications such as ABINIT and Phonopy.
But mainly, the module system helps you set up the proper environment for your preferred compiler, especially when you want to build or run MPI-parallel software. Two choices need to be made: (a) compiler family, and (b) MPI implementation. On Tardis, 4 total combinations are possible. Taking the choices for Fortran as an example, (a) can be either gfortran or ifort, and (b) can be either OpenMPI or Intel MPI. Program names like mpif90 and mpiifort are merely wrappers to assist you in using a particular compiler with a particular MPI implementation. Here is how you achieve the 4 combinations on Tardis:
Compiler and MPI Module to load Compile command Run command gcc/gfortran/OpenMPI rocks-openmpi mpicc or mpif90 mpiexec --mca btl openib,self,sm icc/ifort/OpenMPI openmpi-1.6.2-intel-x86_64 mpicc or mpif90 mpiexec gcc/gfortran/Intel MPI intel-mpi-4.0.3-x86_64 mpicc or mpif90 mpiexec icc/ifort/Intel MPI intel-mpi-4.0.3-x86_64 ***mpiicc or mpiif90*** mpiexec
In all cases, mpirun and mpiexec are equivalent. The rocks-openmpi module is loaded by default; however, some extra options to mpiexec (or mpirun) are needed to tell it to use the Infiniband interconnect. If you are unsure what you're getting in cases 1-3, try doing (a) "mpif90 --version" and (b) "which mpiexec": (a) will tell you the compiler, and (b) will tell you the MPI you are using, as indicated by the path.
When you submit your batch job, don't assume the environment is automatically the same on the batch nodes, even if you submitted the job with "qsub -V". You will again want to use module commands to set up the run environment in batch, so it matches the one that was in place when the application was built. However, be aware that in batch, the shell's built-in module command is not automatically recognized. The module system must first be initialized with a special command:
. /usr/share/Modules/init/sh module load my-favorite-MPI-and-compiler
Don't omit the dot! You can run the above commands in two ways:
- Insert them into your batch script, somewhere near the top.
- Put them in your $HOME/.profile. This is handy if you always use the same set of modules. It is the only way that works with rocks-openmpi.
Other tips:
- Use "module list" to see what modules are currently loaded.
- Use "module avail" to see what modules are available.
- If you want to switch compilers, use "module switch", or at least do "module unload" first, to reset the current compiler-specific environment.
- You can use "module purge" if you want to be absolutely sure that you're starting with a clean slate.
Cluster Tips
Cluster Status: Ganglia < link is http://tardis.cac.cornell.edu/ganglia/
Submit HELP requests: help < link is https://www.cac.cornell.edu/help OR by sending email to: help@cac.cornell.edu