Difference between revisions of "TARDIS Cluster"
(→Queues) |
|||
(14 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
== TARDIS General Information == | == TARDIS General Information == | ||
:* Tardis is a private cluster with restricted access to Craig Fennie's '''cjf76_0001''' group. | :* Tardis is a private cluster with restricted access to Craig Fennie's '''cjf76_0001''' group. | ||
− | :* Tardis has one head node and | + | :* Tardis has one head node and 101 compute nodes (1420 cores in total - compute node details can be found below under the 'Queues' section) |
+ | :* Tardis queue C5 is restricted to collaborators. | ||
+ | :* Tardis queue C7 is restricted to Prof. Benedek's nab83_0001 group. | ||
:* Head node: '''tardis.cac.cornell.edu''' (access via ssh only) serves as: | :* Head node: '''tardis.cac.cornell.edu''' (access via ssh only) serves as: | ||
Line 17: | Line 19: | ||
ssh USERID@tardis.cac.cornell.edu | ssh USERID@tardis.cac.cornell.edu | ||
− | General information about running jobs and logging in can be found in the [[Getting | + | General information about running jobs and logging in can be found in the [[Getting Started|Getting started documentation]] |
Reminder this is information about the v4 cluster, not the tardis; the commands, syntax for batch jobs would be similar. (use qsub, not nsub) | Reminder this is information about the v4 cluster, not the tardis; the commands, syntax for batch jobs would be similar. (use qsub, not nsub) | ||
Line 38: | Line 40: | ||
qstat -q shows Tardis has 6 separate queues. Descriptions of the nodes in each queue follow. | qstat -q shows Tardis has 6 separate queues. Descriptions of the nodes in each queue follow. | ||
− | :*c1: compute-1-[1-16].tardis | + | :*c1: compute-1-[1-14,16].tardis |
:*c2: compute-2-[1-16].tardis | :*c2: compute-2-[1-16].tardis | ||
− | :*c3: compute-3-[1-16].tardis | + | :*c3: compute-3-[1,4,6-7,10,13-16].tardis |
:*c4: compute-4-[1-16].tardis | :*c4: compute-4-[1-16].tardis | ||
− | :*c5: compute-5-[1- | + | :*c5: compute-5-[1-9,11].tardis |
:*c6: compute-6-[1-5].tardis | :*c6: compute-6-[1-5].tardis | ||
+ | |||
+ | :*c7: compute-7-[1-16].tardis | ||
c1: (original queue, slowest processors in the cluster). | c1: (original queue, slowest processors in the cluster). | ||
− | Number of nodes: | + | Number of nodes: 15 servers (8 cores per server, 120 total cores) |
− | Node names: compute-1-[1-16].tardis | + | Node names: compute-1-[1-14,16].tardis |
HW: Intel Xeon E5430 2.66GHz | HW: Intel Xeon E5430 2.66GHz | ||
Memory: 16GB ram/server | Memory: 16GB ram/server | ||
Line 66: | Line 70: | ||
c3: (fastest processors) | c3: (fastest processors) | ||
− | Number of nodes: | + | Number of nodes: 10 servers (20 cores per server, 200 total cores) |
− | Node names: compute-3-[1- | + | Node names: compute-3-[1,4,6-7,10,13-16].tardis |
HW: Intel Xeon E5-2670v2 2.50GHz | HW: Intel Xeon E5-2670v2 2.50GHz | ||
Memory: 48GB ram/server | Memory: 48GB ram/server | ||
Line 80: | Line 84: | ||
c5: | c5: | ||
− | Number of nodes: | + | Number of nodes: 10 servers (16 cores per server, 160 total cores) |
− | Node names: compute-5-[1- | + | Node names: compute-5-[1-9,11].tardis |
HW: Intel Xeon E5-2680 2.70GHz | HW: Intel Xeon E5-2680 2.70GHz | ||
Memory: 32GB ram/server | Memory: 32GB ram/server | ||
Line 93: | Line 97: | ||
/tmp: 242GB | /tmp: 242GB | ||
− | Bear in mind that the login node | + | c7: (for members of group nab83_0001 only) |
− | sensitive. | + | Number of nodes: 16 servers (20 cores per server, 320 total cores) |
+ | Node names: compute-7-[1-16].tardis | ||
+ | HW: Intel Xeon E52660V3, 2.6GHz | ||
+ | Memory: 64GB ram/server | ||
+ | /tmp: 242GB | ||
+ | |||
+ | Bear in mind that the login node may have different cpus that some of the older compute nodes (c1, c2). Therefore, '''programs compiled on the login node may not run on the older hardware in queues c1-c2''', unless you supply the correct compiler options for those architectures. Options affecting the Intel SSE level can be especially sensitive. | ||
== Module System == | == Module System == | ||
Line 102: | Line 112: | ||
But mainly, the module system helps you set up the proper environment for your preferred compiler, especially when you want to build or run MPI-parallel software. Two choices need to be made: (a) compiler family, and (b) MPI implementation. On Tardis, 4 total combinations are possible. Taking the choices for Fortran as an example, (a) can be either gfortran or ifort, and (b) can be either OpenMPI or Intel MPI. Program names like mpif90 and mpiifort are merely wrappers to assist you in using a particular compiler with a particular MPI implementation. Here is how you achieve the 4 combinations on Tardis: | But mainly, the module system helps you set up the proper environment for your preferred compiler, especially when you want to build or run MPI-parallel software. Two choices need to be made: (a) compiler family, and (b) MPI implementation. On Tardis, 4 total combinations are possible. Taking the choices for Fortran as an example, (a) can be either gfortran or ifort, and (b) can be either OpenMPI or Intel MPI. Program names like mpif90 and mpiifort are merely wrappers to assist you in using a particular compiler with a particular MPI implementation. Here is how you achieve the 4 combinations on Tardis: | ||
− | ::{| | + | ::{|class="wikitable" |
|- | |- | ||
|'''Compiler and MPI''' | |'''Compiler and MPI''' | ||
Line 130: | Line 140: | ||
|} | |} | ||
− | In all cases, mpirun and mpiexec are equivalent. The rocks-openmpi module is loaded by default; however, some extra options to mpiexec (or mpirun) are needed to tell it to use the | + | In all cases, the commands mpirun and mpiexec are equivalent. The rocks-openmpi module is loaded by default; however, some extra options to mpiexec (or mpirun) are needed to tell it to use the InfiniBand interconnect. (Alternatively, you can do "module swap rocks-openmpi rocks-openmpi_ib" and skip the --mca options to mpiexec. Do not try to use IB in the c6 queue.) If you are unsure what you're getting in cases 1-3, try doing (a) "mpif90 --version" and (b) "which mpiexec": (a) will tell you the compiler, and (b) will tell you the MPI you are using, as indicated by the path. |
− | When you submit your batch job, '''don't assume the environment is automatically the same on the batch nodes''', even if you submitted the job with "qsub -V". You will again want to use module commands to set up the run environment in batch, so it matches the one that was in place when the application was built. However, be aware that in batch, the shell's built-in module command is not automatically recognized. The module system must first be initialized with a special command: | + | When you submit your batch job, '''don't assume the environment is automatically the same on the batch nodes''', even if you submitted the job with "qsub -V". You will again want to use module commands to set up the run environment in batch, so it matches the one that was in place when the application was built. However, be aware that in batch, the shell's built-in module command is not automatically recognized (unless your job is interactive). The module system must first be initialized with a special command: |
. /usr/share/Modules/init/sh | . /usr/share/Modules/init/sh | ||
Line 148: | Line 158: | ||
:* If you want to switch compilers, use "module switch", or at least do "module unload" first, to reset the current compiler-specific environment. | :* If you want to switch compilers, use "module switch", or at least do "module unload" first, to reset the current compiler-specific environment. | ||
:* You can use "module purge" if you want to be absolutely sure that you're starting with a clean slate. | :* You can use "module purge" if you want to be absolutely sure that you're starting with a clean slate. | ||
+ | |||
+ | === Additional Software === | ||
+ | The following software is available via loading appropriate modules ("module load <module name>"): | ||
+ | ::{|class="wikitable" | ||
+ | |- | ||
+ | |'''Software''' | ||
+ | |'''Module Name''' | ||
+ | |- | ||
+ | |abinit | ||
+ | |abinit-8.4.2 | ||
+ | |- | ||
+ | |python 2.7.11 with phonopy and phono3py | ||
+ | |python-2.7.11 | ||
+ | |- | ||
+ | |xcrysden | ||
+ | |xcrysden-1.5.60 | ||
+ | |- | ||
+ | |DFT+ embedded DMFT Functional | ||
+ | |EDMFTF-Jan2020 | ||
+ | |} | ||
== Cluster Tips == | == Cluster Tips == |
Latest revision as of 12:31, 21 January 2021
TARDIS General Information
- Tardis is a private cluster with restricted access to Craig Fennie's cjf76_0001 group.
- Tardis has one head node and 101 compute nodes (1420 cores in total - compute node details can be found below under the 'Queues' section)
- Tardis queue C5 is restricted to collaborators.
- Tardis queue C7 is restricted to Prof. Benedek's nab83_0001 group.
- Head node: tardis.cac.cornell.edu (access via ssh only) serves as:
- the cluster scheduler
- Rocks Cluster server for compute node deployment
- Compute nodes: (see below for node names and further information by the queue)
- Head node: tardis.cac.cornell.edu (access via ssh only) serves as:
Getting Started
login via ssh to tardis.cac.cornell.edu:
ssh USERID@tardis.cac.cornell.edu
General information about running jobs and logging in can be found in the Getting started documentation
Reminder this is information about the v4 cluster, not the tardis; the commands, syntax for batch jobs would be similar. (use qsub, not nsub)
showq displays jobs running
Once you have a job batch file created, you use 'qsub jobname' (not nsub as written in the v4 documentation).
Use: checkjob/canceljob to check on or cancel your job.
Scheduler
Maui 3.3.1/Torque 3.0.5; no project ID required.
Queues
qstat -q shows Tardis has 6 separate queues. Descriptions of the nodes in each queue follow.
- c1: compute-1-[1-14,16].tardis
- c2: compute-2-[1-16].tardis
- c3: compute-3-[1,4,6-7,10,13-16].tardis
- c4: compute-4-[1-16].tardis
- c5: compute-5-[1-9,11].tardis
- c6: compute-6-[1-5].tardis
- c7: compute-7-[1-16].tardis
c1: (original queue, slowest processors in the cluster).
Number of nodes: 15 servers (8 cores per server, 120 total cores) Node names: compute-1-[1-14,16].tardis HW: Intel Xeon E5430 2.66GHz Memory: 16GB ram/server /tmp: 77GB
c2:
Number of nodes: 16 servers (8 cores per server, 128 total cores) Node names: compute-2-[1-16].tardis HW: Intel Xeon E5530 2.40GHz Memory: 16GB ram/server /tmp: 113GB
c3: (fastest processors)
Number of nodes: 10 servers (20 cores per server, 200 total cores) Node names: compute-3-[1,4,6-7,10,13-16].tardis HW: Intel Xeon E5-2670v2 2.50GHz Memory: 48GB ram/server /tmp: 242GB
c4:
Number of nodes: 16 servers (12 cores per server, 192 total cores) Node names: compute-4-[1-16].tardis HW: Intel Xeon E5-2640 2.50GHz Memory: 32GB ram/server /tmp: 242GB
c5:
Number of nodes: 10 servers (16 cores per server, 160 total cores) Node names: compute-5-[1-9,11].tardis HW: Intel Xeon E5-2680 2.70GHz Memory: 32GB ram/server /tmp: 242GB
c6: (for collaborators)
Number of nodes: 5 servers (12 cores per server, 60 total cores) Node names: compute-6-[1-5].tardis HW: Intel Xeon E5-2643v2 3.50GHz Memory: 32GB ram/server /tmp: 242GB
c7: (for members of group nab83_0001 only)
Number of nodes: 16 servers (20 cores per server, 320 total cores) Node names: compute-7-[1-16].tardis HW: Intel Xeon E52660V3, 2.6GHz Memory: 64GB ram/server /tmp: 242GB
Bear in mind that the login node may have different cpus that some of the older compute nodes (c1, c2). Therefore, programs compiled on the login node may not run on the older hardware in queues c1-c2, unless you supply the correct compiler options for those architectures. Options affecting the Intel SSE level can be especially sensitive.
Module System
The "module load" command helps you set up the software environment (e.g., $PATH) correctly for certain applications such as ABINIT and Phonopy.
But mainly, the module system helps you set up the proper environment for your preferred compiler, especially when you want to build or run MPI-parallel software. Two choices need to be made: (a) compiler family, and (b) MPI implementation. On Tardis, 4 total combinations are possible. Taking the choices for Fortran as an example, (a) can be either gfortran or ifort, and (b) can be either OpenMPI or Intel MPI. Program names like mpif90 and mpiifort are merely wrappers to assist you in using a particular compiler with a particular MPI implementation. Here is how you achieve the 4 combinations on Tardis:
Compiler and MPI Module to load Compile command Run command gcc/gfortran/OpenMPI rocks-openmpi mpicc or mpif90 mpiexec --mca btl openib,self,sm icc/ifort/OpenMPI openmpi-1.6.2-intel-x86_64 mpicc or mpif90 mpiexec gcc/gfortran/Intel MPI intel-mpi-4.0.3-x86_64 mpicc or mpif90 mpiexec icc/ifort/Intel MPI intel-mpi-4.0.3-x86_64 ***mpiicc or mpiif90*** mpiexec
In all cases, the commands mpirun and mpiexec are equivalent. The rocks-openmpi module is loaded by default; however, some extra options to mpiexec (or mpirun) are needed to tell it to use the InfiniBand interconnect. (Alternatively, you can do "module swap rocks-openmpi rocks-openmpi_ib" and skip the --mca options to mpiexec. Do not try to use IB in the c6 queue.) If you are unsure what you're getting in cases 1-3, try doing (a) "mpif90 --version" and (b) "which mpiexec": (a) will tell you the compiler, and (b) will tell you the MPI you are using, as indicated by the path.
When you submit your batch job, don't assume the environment is automatically the same on the batch nodes, even if you submitted the job with "qsub -V". You will again want to use module commands to set up the run environment in batch, so it matches the one that was in place when the application was built. However, be aware that in batch, the shell's built-in module command is not automatically recognized (unless your job is interactive). The module system must first be initialized with a special command:
. /usr/share/Modules/init/sh module load my-favorite-MPI-and-compiler
Don't omit the dot! You can run the above commands in two ways:
- Insert them into your batch script, somewhere near the top.
- Put them in your $HOME/.profile. This is handy if you always use the same set of modules. It is the only way that works with rocks-openmpi.
Other tips:
- Use "module list" to see what modules are currently loaded.
- Use "module avail" to see what modules are available.
- If you want to switch compilers, use "module switch", or at least do "module unload" first, to reset the current compiler-specific environment.
- You can use "module purge" if you want to be absolutely sure that you're starting with a clean slate.
Additional Software
The following software is available via loading appropriate modules ("module load <module name>"):
Software Module Name abinit abinit-8.4.2 python 2.7.11 with phonopy and phono3py python-2.7.11 xcrysden xcrysden-1.5.60 DFT+ embedded DMFT Functional EDMFTF-Jan2020
Cluster Tips
Cluster Status: Ganglia < link is http://tardis.cac.cornell.edu/ganglia/
Submit HELP requests: help < link is https://www.cac.cornell.edu/help OR by sending email to: help@cac.cornell.edu