Difference between revisions of "TARDIS Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 1: Line 1:
== TARDIS General Information ==
== For Current TARDIS Information see [https://www.cac.cornell.edu/wiki/index.php?title=TARDIS3_Cluster TARDIS3] ==
:* Tardis is a private cluster with restricted access to Craig Fennie's '''cjf76_0001''' group.
:* Tardis is a private cluster with restricted access to Craig Fennie's '''cjf76_0001''' group.
:* Tardis has one head node and 101 compute nodes (1420 cores in total - compute node details can be found below under the 'Queues' section)
:* Tardis has one head node and 101 compute nodes (1420 cores in total - compute node details can be found below under the 'Queues' section)

Latest revision as of 17:01, 10 February 2022


For Current TARDIS Information see TARDIS3



  • Tardis is a private cluster with restricted access to Craig Fennie's cjf76_0001 group.
  • Tardis has one head node and 101 compute nodes (1420 cores in total - compute node details can be found below under the 'Queues' section)
  • Tardis queue C5 is restricted to collaborators.
  • Tardis queue C7 is restricted to Prof. Benedek's nab83_0001 group.
  • Head node: tardis.cac.cornell.edu (access via ssh only) serves as:
    • the cluster scheduler
    • Rocks Cluster server for compute node deployment
  • Compute nodes: (see below for node names and further information by the queue)
  • Rocks 6.1 with CentOS 6.3
  • Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending email to: help@cac.cornell.edu

Getting Started

login via ssh to tardis.cac.cornell.edu:

   ssh USERID@tardis.cac.cornell.edu

General information about running jobs and logging in can be found in the Getting started documentation

Reminder this is information about the v4 cluster, not the tardis; the commands, syntax for batch jobs would be similar. (use qsub, not nsub)

showq displays jobs running

Once you have a job batch file created, you use 'qsub jobname' (not nsub as written in the v4 documentation).

Use: checkjob/canceljob to check on or cancel your job.


Maui 3.3.1/Torque 3.0.5; no project ID required.


qstat -q shows Tardis has 6 separate queues. Descriptions of the nodes in each queue follow.

  • c1: compute-1-[1-14,16].tardis
  • c2: compute-2-[1-16].tardis
  • c3: compute-3-[1,4,6-7,10,13-16].tardis
  • c4: compute-4-[1-16].tardis
  • c5: compute-5-[1-9,11].tardis
  • c6: compute-6-[1-5].tardis
  • c7: compute-7-[1-16].tardis

c1: (original queue, slowest processors in the cluster).

 Number of nodes: 15 servers (8 cores per server, 120 total cores)
 Node names: compute-1-[1-14,16].tardis
 HW: Intel Xeon E5430 2.66GHz
 Memory: 16GB ram/server
 /tmp: 77GB


 Number of nodes: 16 servers (8 cores per server, 128 total cores)
 Node names: compute-2-[1-16].tardis
 HW: Intel Xeon E5530 2.40GHz
 Memory: 16GB ram/server
 /tmp: 113GB

c3: (fastest processors)

 Number of nodes: 10 servers (20 cores per server, 200 total cores)
 Node names: compute-3-[1,4,6-7,10,13-16].tardis
 HW: Intel Xeon E5-2670v2 2.50GHz
 Memory: 48GB ram/server
 /tmp: 242GB


 Number of nodes: 16 servers (12 cores per server, 192 total cores)
 Node names: compute-4-[1-16].tardis
 HW: Intel Xeon E5-2640 2.50GHz
 Memory: 32GB ram/server
 /tmp: 242GB


 Number of nodes: 10 servers (16 cores per server, 160 total cores)
 Node names: compute-5-[1-9,11].tardis
 HW: Intel Xeon E5-2680 2.70GHz
 Memory: 32GB ram/server
 /tmp: 242GB

c6: (for collaborators)

 Number of nodes: 5 servers (12 cores per server, 60 total cores)
 Node names: compute-6-[1-5].tardis
 HW: Intel Xeon E5-2643v2 3.50GHz
 Memory: 32GB ram/server
 /tmp: 242GB

c7: (for members of group nab83_0001 only)

 Number of nodes: 16 servers (20 cores per server, 320 total cores)
 Node names: compute-7-[1-16].tardis
 HW: Intel Xeon E52660V3, 2.6GHz
 Memory: 64GB ram/server
 /tmp: 242GB

Bear in mind that the login node may have different cpus that some of the older compute nodes (c1, c2). Therefore, programs compiled on the login node may not run on the older hardware in queues c1-c2, unless you supply the correct compiler options for those architectures. Options affecting the Intel SSE level can be especially sensitive.

Module System

The "module load" command helps you set up the software environment (e.g., $PATH) correctly for certain applications such as ABINIT and Phonopy.

But mainly, the module system helps you set up the proper environment for your preferred compiler, especially when you want to build or run MPI-parallel software. Two choices need to be made: (a) compiler family, and (b) MPI implementation. On Tardis, 4 total combinations are possible. Taking the choices for Fortran as an example, (a) can be either gfortran or ifort, and (b) can be either OpenMPI or Intel MPI. Program names like mpif90 and mpiifort are merely wrappers to assist you in using a particular compiler with a particular MPI implementation. Here is how you achieve the 4 combinations on Tardis:

Compiler and MPI Module to load Compile command Run command
gcc/gfortran/OpenMPI rocks-openmpi mpicc or mpif90 mpiexec --mca btl openib,self,sm
icc/ifort/OpenMPI openmpi-1.6.2-intel-x86_64 mpicc or mpif90 mpiexec
gcc/gfortran/Intel MPI intel-mpi-4.0.3-x86_64 mpicc or mpif90 mpiexec
icc/ifort/Intel MPI intel-mpi-4.0.3-x86_64 ***mpiicc or mpiif90*** mpiexec

In all cases, the commands mpirun and mpiexec are equivalent. The rocks-openmpi module is loaded by default; however, some extra options to mpiexec (or mpirun) are needed to tell it to use the InfiniBand interconnect. (Alternatively, you can do "module swap rocks-openmpi rocks-openmpi_ib" and skip the --mca options to mpiexec. Do not try to use IB in the c6 queue.) If you are unsure what you're getting in cases 1-3, try doing (a) "mpif90 --version" and (b) "which mpiexec": (a) will tell you the compiler, and (b) will tell you the MPI you are using, as indicated by the path.

When you submit your batch job, don't assume the environment is automatically the same on the batch nodes, even if you submitted the job with "qsub -V". You will again want to use module commands to set up the run environment in batch, so it matches the one that was in place when the application was built. However, be aware that in batch, the shell's built-in module command is not automatically recognized (unless your job is interactive). The module system must first be initialized with a special command:

. /usr/share/Modules/init/sh
module load my-favorite-MPI-and-compiler

Don't omit the dot! You can run the above commands in two ways:

  • Insert them into your batch script, somewhere near the top.
  • Put them in your $HOME/.profile. This is handy if you always use the same set of modules. It is the only way that works with rocks-openmpi.

Other tips:

  • Use "module list" to see what modules are currently loaded.
  • Use "module avail" to see what modules are available.
  • If you want to switch compilers, use "module switch", or at least do "module unload" first, to reset the current compiler-specific environment.
  • You can use "module purge" if you want to be absolutely sure that you're starting with a clean slate.

Additional Software

The following software is available via loading appropriate modules ("module load <module name>"):

Software Module Name
abinit abinit-8.4.2
python 2.7.11 with phonopy and phono3py python-2.7.11
xcrysden xcrysden-1.5.60
DFT+ embedded DMFT Functional EDMFTF-Jan2020

Cluster Tips

Cluster Status: Ganglia < link is http://tardis.cac.cornell.edu/ganglia/

Submit HELP requests: help < link is https://www.cac.cornell.edu/help OR by sending email to: help@cac.cornell.edu