Difference between revisions of "ALTAS Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is a private cluster.
+
This is a private cluster, accessible to only users in <tt>dm24_0001</tt> project.
  
 
=Hardware=
 
=Hardware=
Line 5: Line 5:
 
:* Access modes: ssh
 
:* Access modes: ssh
 
:* OpenHPC 2.3 with Rocky Linux 8.4
 
:* OpenHPC 2.3 with Rocky Linux 8.4
:* 4 compute nodes (c0001-c0004) with dual 64-core AMD EPYC 7713 processor, 1 TB of RAM
+
:* 4 compute nodes (c0001-c0004). Each node has dual 64-core AMD EPYC 7713 processors, 1 TB of RAM, and 4 NVidia A100 GPUs.
 
:* Hyperthreading is enabled on all nodes, i.e., each physical core is considered to consist of two logical CPUs
 
:* Hyperthreading is enabled on all nodes, i.e., each physical core is considered to consist of two logical CPUs
 
:* Interconnect is 100 Gbps ethernet
 
:* Interconnect is 100 Gbps ethernet
Line 17: Line 17:
  
 
=Scheduler/Queues=
 
=Scheduler/Queues=
:* The cluster scheduler is Slurm. All nodes are configured to be in the "normal" partition with no time limits. See [[ slurm | Slurm documentation page ]] for details.
+
:* The cluster scheduler is Slurm. All nodes are configured to be in the "normal" partition with no time limits. See [[ slurm | Slurm documentation page ]] for details. See the [[ Slurm#Requesting_GPUs | Requesting GPUs ]] section for information on how to request GPUs on compute nodes for your jobs.
 +
:*# <code>--gres=gpu:2g.20gb:<number of MIG devices></code> or <code>--gres=gpu:1g.10gb:1</code> to request MIG devices. The job will land on one of c0002, c0003, or c0004.
 +
:*# <code>--gres=gpu:a100:<number of GPUs></code> to request entire A100 GPUs. The job will land on node c0001.
 
:* Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
 
:* Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
:* See the [[ Slurm#Requesting_GPUs | Requesting GPUs ]] section for information on how to request GPUs on the compute nodes.
 
 
:* Partitions (queues):
 
:* Partitions (queues):
 
::{| border="1" cellspacing="0" cellpadding="10"
 
::{| border="1" cellspacing="0" cellpadding="10"
Line 55: Line 56:
 
   adios/1.13.1        netcdf-fortran/4.5.2    py3-mpi4py/3.0.3
 
   adios/1.13.1        netcdf-fortran/4.5.2    py3-mpi4py/3.0.3
 
   boost/1.75.0        netcdf/4.7.3            py3-scipy/1.5.1
 
   boost/1.75.0        netcdf/4.7.3            py3-scipy/1.5.1
   fftw/3.3.8          opencoarrays/2.9.2      scalapack/2.1.0
+
   fftw/3.3.8          opencoarrays/2.9.2      quantum-espresso/6.8
   hypre/2.18.1        petsc/3.14.4            slepc/3.14.2
+
   hypre/2.18.1        petsc/3.14.4            scalapack/2.1.0
   mfem/4.2            phdf5/1.10.6            superlu_dist/6.1.1
+
   mfem/4.2            phdf5/1.10.6            slepc/3.14.2
   mumps/5.2.1        pnetcdf/1.12.1          trilinos/13.0.0
+
   mumps/5.2.1        pnetcdf/1.12.1          superlu_dist/6.1.1
   netcdf-cxx/4.3.1    ptscotch/6.0.6
+
   netcdf-cxx/4.3.1    ptscotch/6.0.6         trilinos/13.0.0
  
 
------------------------ /opt/ohpc/pub/moduledeps/gnu9 -------------------------
 
------------------------ /opt/ohpc/pub/moduledeps/gnu9 -------------------------
   R/4.1.0       impi/2021.3.0          mvapich2/2.3.4          superlu/5.2.1
+
   gsl/2.6       mpich/3.3.2-ofi   openmpi4/4.0.5  (L)
   gdal/3.3.1    impi/2021.3.1  (D)    openblas/0.3.7
+
   hdf5/1.10.6    mvapich2/2.3.4    py3-numpy/1.19.0
  gsl/2.6        metis/5.1.0            openmpi4/4.0.5  (L)
+
  metis/5.1.0    openblas/0.3.7    superlu/5.2.1
   hdf5/1.10.6    mpich/3.3.2-ofi        py3-numpy/1.19.0
 
  
 
-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
 
-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
   GMAT/R2020a                julia/1.6.2            proj/8.1.0
+
   autotools   (L)    libfabric/1.12.1 (L)    os
   autotools          (L)    libfabric/1.12.1 (L)    prun/2.1        (L)
+
  cmake/3.19.4        matlab/R2021a          prun/2.1        (L)
   cmake/3.19.4              octave/6.3.0            ucx/1.9.0      (L)
+
   cuda/11.5          nvhpc/21.9              ucx/1.9.0      (L)
   gnu9/9.3.0         (L)    ohpc            (L)    valgrind/3.16.1
+
   gnu9/9.3.0   (L)    ohpc            (L)    valgrind/3.16.1
  intel/2021.3.0.3350        os                      visit/3.2.1
 
  
 
   Where:
 
   Where:
  D:  Default Module
 
 
   L:  Module is loaded
 
   L:  Module is loaded
  
Line 84: Line 82:
 
To load a module and verify:
 
To load a module and verify:
 
<pre>
 
<pre>
-bash-4.2$ module load R/4.1.0
+
-bash-4.2$ module load matlab/R2021a
 
-bash-4.2$ module list
 
-bash-4.2$ module list
  
 
Currently Loaded Modules:
 
Currently Loaded Modules:
   1) autotools   4) ucx/1.9.0          7) ohpc
+
Currently Loaded Modules:
   2) prun/2.1     5) libfabric/1.12.1  8) openblas/0.3.7
+
   1) autotools   3) gnu9/9.3.0  5) libfabric/1.12.7) ohpc
  3) gnu9/9.3.0   6) openmpi4/4.0.5    9) R/4.1.0
+
   2) prun/2.1   4) ucx/1.9.0   6) openmpi4/4.0.5    8) matlab/R2021a
 
</pre>
 
</pre>
 
To unload a module and verify:
 
To unload a module and verify:
 
<pre>
 
<pre>
-bash-4.2$ module unload R
+
-bash-4.2$ module unload matlab/R2021a
 
-bash-4.2$ module list
 
-bash-4.2$ module list
  
Line 101: Line 99:
 
   2) prun/2.1    4) ucx/1.9.0    6) openmpi4/4.0.5
 
   2) prun/2.1    4) ucx/1.9.0    6) openmpi4/4.0.5
 
</pre>
 
</pre>
 
 
== Install R Packages in Home Directory ==
 
If you need a new R package not installed on the system, you can [[Install R Packages in Your Home Directory | install R packages in your home directory using these instructions]].
 
  
 
== Manage Modules in Your Python Virtual Environment ==
 
== Manage Modules in Your Python Virtual Environment ==
Line 189: Line 183:
 
! Path
 
! Path
 
! Notes
 
! Notes
|-
 
| Intel oneAPI || <pre>/opt/intel/oneapi/</pre>
 
|
 
* module swap gnu9 intel; module swap openmpi4 impi
 
* includes the HPC Toolkit with Intel MPI and Intel classic compilers (icc, ifort)
 
 
|-
 
|-
 
| GCC 9.3 || <pre>/opt/ohpc/pub/compiler/gcc/9.3.0/</pre>
 
| GCC 9.3 || <pre>/opt/ohpc/pub/compiler/gcc/9.3.0/</pre>
Line 206: Line 195:
 
|
 
|
 
* module load matlab/R2021a
 
* module load matlab/R2021a
 +
|-
 +
| Nvidia HPC SDK 21.9 || <pre>/opt/ohpc/pub/compiler/nvhpc/21.9</pre>
 +
| Nvidia HPC SDK includes nvfortran for compiling CUDA enabled fortran code.
 +
* module load nvhpc/21.9
 
|-
 
|-
 
| Quantum Espresso 6.8 || <pre>/opt/ohpc/pub/apps/quantum-espresso/6.8</pre>
 
| Quantum Espresso 6.8 || <pre>/opt/ohpc/pub/apps/quantum-espresso/6.8</pre>
 
|
 
|
 
* module load quantum-espresso/6.8
 
* module load quantum-espresso/6.8
 +
|-
 +
| WINE 6.22 || <pre>/opt/ohpc/pub/apps/wine/6.0.2</pre>
 +
|
 +
* module load wine/6.0.2
 
|}
 
|}
  
 
=Help=
 
=Help=
 
:* Submit questions or requests at [https://www.cac.cornell.edu/help help] or by sending email to: [mailto:help@cac.cornell.edu help@cac.cornell.edu]. Please include Hopper in the subject area.
 
:* Submit questions or requests at [https://www.cac.cornell.edu/help help] or by sending email to: [mailto:help@cac.cornell.edu help@cac.cornell.edu]. Please include Hopper in the subject area.

Revision as of 12:09, 16 December 2021

This is a private cluster, accessible to only users in dm24_0001 project.

Hardware

  • Head node: altas.cac.cornell.edu.
  • Access modes: ssh
  • OpenHPC 2.3 with Rocky Linux 8.4
  • 4 compute nodes (c0001-c0004). Each node has dual 64-core AMD EPYC 7713 processors, 1 TB of RAM, and 4 NVidia A100 GPUs.
  • Hyperthreading is enabled on all nodes, i.e., each physical core is considered to consist of two logical CPUs
  • Interconnect is 100 Gbps ethernet
  • Submit HELP requests: help OR by sending an email to CAC support please include Altas in the subject area.

File Systems

Home Directories

  • Path: ~

User home directories is located on a NFS export from the head node. Use your home directory (~) for archiving the data you wish to keep. Data in user's home directories are NOT backed up.

Scheduler/Queues

  • The cluster scheduler is Slurm. All nodes are configured to be in the "normal" partition with no time limits. See Slurm documentation page for details. See the Requesting GPUs section for information on how to request GPUs on compute nodes for your jobs.
    1. --gres=gpu:2g.20gb:<number of MIG devices> or --gres=gpu:1g.10gb:1 to request MIG devices. The job will land on one of c0002, c0003, or c0004.
    2. --gres=gpu:a100:<number of GPUs> to request entire A100 GPUs. The job will land on node c0001.
  • Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
  • Partitions (queues):
Name Description Time Limit
normal all nodes, each node with 4 Nvidia A100 GPUs no limit

Software

Work with Environment Modules

Set up the working environment for each package using the module command. The module command will activate dependent modules if there are any.

To show currently loaded modules: (These modules are loaded by default system configurations)

-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   3) gnu9/9.3.0   5) libfabric/1.12.1   7) ohpc
  2) prun/2.1    4) ucx/1.9.0    6) openmpi4/4.0.5

To show all available modules (as of August 5, 2021):

-bash-4.2$ module avail

-------------------- /opt/ohpc/pub/moduledeps/gnu9-openmpi4 --------------------
   adios/1.13.1        netcdf-fortran/4.5.2    py3-mpi4py/3.0.3
   boost/1.75.0        netcdf/4.7.3            py3-scipy/1.5.1
   fftw/3.3.8          opencoarrays/2.9.2      quantum-espresso/6.8
   hypre/2.18.1        petsc/3.14.4            scalapack/2.1.0
   mfem/4.2            phdf5/1.10.6            slepc/3.14.2
   mumps/5.2.1         pnetcdf/1.12.1          superlu_dist/6.1.1
   netcdf-cxx/4.3.1    ptscotch/6.0.6          trilinos/13.0.0

------------------------ /opt/ohpc/pub/moduledeps/gnu9 -------------------------
   gsl/2.6        mpich/3.3.2-ofi    openmpi4/4.0.5   (L)
   hdf5/1.10.6    mvapich2/2.3.4     py3-numpy/1.19.0
   metis/5.1.0    openblas/0.3.7     superlu/5.2.1

-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
   autotools    (L)    libfabric/1.12.1 (L)    os
   cmake/3.19.4        matlab/R2021a           prun/2.1        (L)
   cuda/11.5           nvhpc/21.9              ucx/1.9.0       (L)
   gnu9/9.3.0   (L)    ohpc             (L)    valgrind/3.16.1

  Where:
   L:  Module is loaded

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".

To load a module and verify:

-bash-4.2$ module load matlab/R2021a
-bash-4.2$ module list

Currently Loaded Modules:
Currently Loaded Modules:
  1) autotools   3) gnu9/9.3.0   5) libfabric/1.12.1   7) ohpc
  2) prun/2.1    4) ucx/1.9.0    6) openmpi4/4.0.5     8) matlab/R2021a

To unload a module and verify:

-bash-4.2$ module unload matlab/R2021a
-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   3) gnu9/9.3.0   5) libfabric/1.12.1   7) ohpc
  2) prun/2.1    4) ucx/1.9.0    6) openmpi4/4.0.5

Manage Modules in Your Python Virtual Environment

python3 (3.6) is installed. Users can manage their own python environment (including installing needed modules) using virtual environments. Please see the documentation on virtual environments on python.org for details.

Create Virtual Environment

You can create as many virtual environments, each in their own directory, as needed.

  • python3: python3 -m venv <your virtual environment directory>

Activate Virtual Environment

You need to activate a virtual environment before using it:

source <your virtual environment directory>/bin/activate

Install Python Modules Using pip

After activating your virtual environment, you can now install python modules for the activated environment:

  • It's always a good idea to update pip first:
pip install --upgrade pip
  • Install the module:
pip install <module name>
  • List installed python modules in the environment:
pip list modules
  • Examples: Install tensorflow and keras like this:
-bash-4.2$ python3 -m venv tensorflow
-bash-4.2$ source tensorflow/bin/activate
(tensorflow) -bash-4.2$ pip install --upgrade pip
Collecting pip
  Using cached https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 18.1
    Uninstalling pip-18.1:
      Successfully uninstalled pip-18.1
Successfully installed pip-19.2.3
(tensorflow) -bash-4.2$ pip install tensorflow keras
Collecting tensorflow
  Using cached https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl
:
:
:
Successfully installed absl-py-0.8.0 astor-0.8.0 gast-0.2.2 google-pasta-0.1.7 grpcio-1.23.0 h5py-2.9.0 keras-2.2.5 keras-applications-1.0.8  [...]
(tensorflow) -bash-4.2$ pip list modules
Package              Version
-------------------- -------
absl-py              0.8.0  
astor                0.8.0  
gast                 0.2.2  
google-pasta         0.1.7  
grpcio               1.23.0 
h5py                 2.9.0  
Keras                2.2.5  
Keras-Applications   1.0.8  
Keras-Preprocessing  1.1.0  
Markdown             3.1.1  
numpy                1.17.1 
pip                  19.2.3 
protobuf             3.9.1  
PyYAML               5.1.2  
scipy                1.3.1  
setuptools           40.6.2 
six                  1.12.0 
tensorboard          1.14.0 
tensorflow           1.14.0 
tensorflow-estimator 1.14.0 
termcolor            1.1.0  
Werkzeug             0.15.5 
wheel                0.33.6 
wrapt                1.11.2 

Software List

Software Path Notes
GCC 9.3
/opt/ohpc/pub/compiler/gcc/9.3.0/
  • module load gnu9/9.3.0 (Loaded by default)
Open MPI 4.0.5
/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.5
  • module load openmpi4/4.0.5 (Loaded by default)
Matlab R2021a
/opt/ohpc/pub/apps/matlab/R2021a
  • module load matlab/R2021a
Nvidia HPC SDK 21.9
/opt/ohpc/pub/compiler/nvhpc/21.9
Nvidia HPC SDK includes nvfortran for compiling CUDA enabled fortran code.
  • module load nvhpc/21.9
Quantum Espresso 6.8
/opt/ohpc/pub/apps/quantum-espresso/6.8
  • module load quantum-espresso/6.8
WINE 6.22
/opt/ohpc/pub/apps/wine/6.0.2
  • module load wine/6.0.2

Help

  • Submit questions or requests at help or by sending email to: help@cac.cornell.edu. Please include Hopper in the subject area.