Difference between revisions of "AIDA Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 67: Line 67:
 
| xxxxxxxxxx
 
| xxxxxxxxxx
 
| no limit
 
| no limit
 +
|}
 +
 +
==Software==
 +
 +
===Work with Environment Modules===
 +
 +
Set up the working environment for each package using the module command. 
 +
The module command will activate dependent modules if there are any.
 +
<pre>
 +
To show currently loaded modules:
 +
-bash-4.2$ module list
 +
 +
To show all available modules:
 +
-bash-4.2$ module avail
 +
 +
To load a module:
 +
-bash-4.2$ module load <software>
 +
 +
To unload a module:
 +
-bash-4.2$ module unload <software>
 +
 +
To swap compilers:
 +
-bash-4.2$ module swap gnu intel
 +
 +
Use "module spider" to find all possible modules and extensions.
 +
Use "module keyword key1 key2 ..." to search for all possible modules matching
 +
any of the "keys".
 +
</pre>
 +
 +
== Manage modules in Your Python Virtual Environment ==
 +
 +
python3 (3.x) is installed. Users can manage their own python environment (including installing needed modules) using virtual environments. Please see [https://packaging.python.org/guides/installing-using-pip-and-virtual-environments the documentation on virtual environments on python.org] for details.
 +
 +
=== Create Virtual Environment ===
 +
 +
You can '''create''' as many virtual environments, each in their own directory, as needed.
 +
 +
* python3: <code>python3 -m venv <your virtual environment directory></code>
 +
 +
=== Activate Virtual Environment ===
 +
 +
You need to '''activate''' a virtual environment before using it:
 +
 +
<pre>source <your virtual environment directory>/bin/activate</pre>
 +
 +
=== Install Python Modules Using pip ===
 +
 +
After activating your virtual environment, you can now install python modules for the activated environment:
 +
 +
* It's always a good idea to update <code>pip</code> first:
 +
<pre>pip install --upgrade pip</pre>
 +
 +
* Install the module:
 +
<pre>pip install <module name></pre>
 +
 +
* List installed python modules in the environment:
 +
<pre>pip list modules</pre>
 +
 +
* Examples: Install <code>tensorflow</code> and <code>keras</code> like this:
 +
 +
<pre>
 +
-bash-4.2$ python3 -m venv tensorflow
 +
-bash-4.2$ source tensorflow/bin/activate
 +
(tensorflow) -bash-4.2$ pip install --upgrade pip
 +
Collecting pip
 +
  Using cached https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl
 +
Installing collected packages: pip
 +
  Found existing installation: pip 18.1
 +
    Uninstalling pip-18.1:
 +
      Successfully uninstalled pip-18.1
 +
Successfully installed pip-19.2.3
 +
(tensorflow) -bash-4.2$ pip install tensorflow keras
 +
Collecting tensorflow
 +
  Using cached https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl
 +
:
 +
:
 +
:
 +
Successfully installed absl-py-0.8.0 astor-0.8.0 gast-0.2.2 google-pasta-0.1.7 grpcio-1.23.0 h5py-2.9.0 keras-2.2.5 keras-applications-1.0.8  [...]
 +
(tensorflow) -bash-4.2$ pip list modules
 +
Package              Version
 +
-------------------- -------
 +
absl-py              0.8.0 
 +
astor                0.8.0 
 +
gast                0.2.2 
 +
google-pasta        0.1.7 
 +
grpcio              1.23.0
 +
h5py                2.9.0 
 +
Keras                2.2.5 
 +
Keras-Applications  1.0.8 
 +
Keras-Preprocessing  1.1.0 
 +
Markdown            3.1.1 
 +
numpy                1.17.1
 +
pip                  19.2.3
 +
protobuf            3.9.1 
 +
PyYAML              5.1.2 
 +
scipy                1.3.1 
 +
setuptools          40.6.2
 +
six                  1.12.0
 +
tensorboard          1.14.0
 +
tensorflow          1.14.0
 +
tensorflow-estimator 1.14.0
 +
termcolor            1.1.0 
 +
Werkzeug            0.15.5
 +
wheel                0.33.6
 +
wrapt                1.11.2
 +
</pre>
 +
 +
== Software List ==
 +
::{| border="1" cellspacing="0" cellpadding="10"
 +
! Software
 +
! Path
 +
! Notes
 +
|-
 +
| GCC 9.3 || <pre>/opt/ohpc/pub/compiler/gcc/9.3.0/</pre>
 +
|
 +
* module load gnu9/9.3.0 (Loaded by default)
 +
|-
 +
| Open MPI 4.0.5 || <pre>/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.5</pre>
 +
|
 +
* module load openmpi4/4.0.5 (Loaded by default)
 +
|-
 +
| Quantum Espresso 6.8 || <pre>/opt/ohpc/pub/apps/quantum-espresso/6.8</pre>
 +
|
 +
* module load quantum-espresso/6.8
 
|}
 
|}
  
 
==Help==
 
==Help==
 
:* Submit questions or requests at [https://www.cac.cornell.edu/help help] or by sending email to: [mailto:help@cac.cornell.edu help@cac.cornell.edu]. Please include AIDA in the subject area.
 
:* Submit questions or requests at [https://www.cac.cornell.edu/help help] or by sending email to: [mailto:help@cac.cornell.edu help@cac.cornell.edu]. Please include AIDA in the subject area.

Revision as of 07:31, 22 September 2022

AIDA General Information

  • Aida is a private cluster with restricted access to members of bs54_0001 and rab38_0001 groups.
  • Head node: aida.cac.cornell.edu (access via ssh)
  • 12 GPU nodes
    • 6 with V100 GPUs (c0017-c0022)
    • 6 with A100 GPUs (c0071-c0076)
  • many nodes from the former Atlas2 cluster

Hardware

  • All GPU nodes support vector extensions up to AVX-512
  • All nodes have hyperthreading turned on.

c00[17-22]:

    2x18 core Intel Xeon Skylake 6154 CPUs with base clock 3GHz (turbo up to 3.7GHz)

c0017: x5 GPU/Nvidia Tesla V100 16GB

    Memory: 754GB
    swap: 187GB
    /tmp: 700GB

c00[18-21]: x5 GPU/Nvidia Tesla V100 16GB

     Memory: 376GB
     swap: 187GB
     /tmp: 700GB

c0022: x2 GPU/Nvidia Tesla V100 16GB

     Memory: 1.5TB 
     swap: 187GB
     /tmp: 100GB
     /scratch: 1TB

c00[71-76]:

    2x28 core Intel Xeon Ice Lake Gold 6348 CPUs with base clock 2.6GHz
    x4 GPU/Nvidia Tesla A100 80GB
    Memory: 1TB
    swap: 187GB
    /tmp: 3TB

Networking

  • 12 GPU nodes have Infiniband
  • older Atlas2 nodes have gigabit ethernet

File Systems

Home Directories

  • Path: ~
  • User's home directories are located on a NFS export from the AIDA head node. Use your home directory (~) for archiving the data you wish to keep. Data in user's home directories are NOT backed up.

BeeGFS

  • Path: ???
  • All users have access. Users should copy active files and run their codes from BeeGFS directories.

Scheduler/Queues

  • The cluster scheduler is Slurm.
  • See Slurm documentation page for details.
  • See the Requesting GPUs section for information on how to request GPUs on compute nodes for your jobs.
    1. --gres=gpu:2g.20gb:<number of MIG devices> or --gres=gpu:1g.10gb:1 to request MIG devices.
  • The job will land on one of the A100 nodes with MIG configured.
    1. --gres=gpu:a100:<number of GPUs> to request entire A100 GPUs. The job will land on an A100 node with no MIG.
  • Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
  • You can ensure that your MPI tasks uses the full physical core by specifying -c 2
  • Partitions (queues):
Name Description Time Limit
normal xxxxxxxxxx no limit

Software

Work with Environment Modules

Set up the working environment for each package using the module command. The module command will activate dependent modules if there are any.

To show currently loaded modules:
-bash-4.2$ module list

To show all available modules:
-bash-4.2$ module avail

To load a module:
-bash-4.2$ module load <software>

To unload a module:
-bash-4.2$ module unload <software>

To swap compilers:
-bash-4.2$ module swap gnu intel

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".

Manage modules in Your Python Virtual Environment

python3 (3.x) is installed. Users can manage their own python environment (including installing needed modules) using virtual environments. Please see the documentation on virtual environments on python.org for details.

Create Virtual Environment

You can create as many virtual environments, each in their own directory, as needed.

  • python3: python3 -m venv <your virtual environment directory>

Activate Virtual Environment

You need to activate a virtual environment before using it:

source <your virtual environment directory>/bin/activate

Install Python Modules Using pip

After activating your virtual environment, you can now install python modules for the activated environment:

  • It's always a good idea to update pip first:
pip install --upgrade pip
  • Install the module:
pip install <module name>
  • List installed python modules in the environment:
pip list modules
  • Examples: Install tensorflow and keras like this:
-bash-4.2$ python3 -m venv tensorflow
-bash-4.2$ source tensorflow/bin/activate
(tensorflow) -bash-4.2$ pip install --upgrade pip
Collecting pip
  Using cached https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 18.1
    Uninstalling pip-18.1:
      Successfully uninstalled pip-18.1
Successfully installed pip-19.2.3
(tensorflow) -bash-4.2$ pip install tensorflow keras
Collecting tensorflow
  Using cached https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl
:
:
:
Successfully installed absl-py-0.8.0 astor-0.8.0 gast-0.2.2 google-pasta-0.1.7 grpcio-1.23.0 h5py-2.9.0 keras-2.2.5 keras-applications-1.0.8  [...]
(tensorflow) -bash-4.2$ pip list modules
Package              Version
-------------------- -------
absl-py              0.8.0  
astor                0.8.0  
gast                 0.2.2  
google-pasta         0.1.7  
grpcio               1.23.0 
h5py                 2.9.0  
Keras                2.2.5  
Keras-Applications   1.0.8  
Keras-Preprocessing  1.1.0  
Markdown             3.1.1  
numpy                1.17.1 
pip                  19.2.3 
protobuf             3.9.1  
PyYAML               5.1.2  
scipy                1.3.1  
setuptools           40.6.2 
six                  1.12.0 
tensorboard          1.14.0 
tensorflow           1.14.0 
tensorflow-estimator 1.14.0 
termcolor            1.1.0  
Werkzeug             0.15.5 
wheel                0.33.6 
wrapt                1.11.2 

Software List

Software Path Notes
GCC 9.3
/opt/ohpc/pub/compiler/gcc/9.3.0/
  • module load gnu9/9.3.0 (Loaded by default)
Open MPI 4.0.5
/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.5
  • module load openmpi4/4.0.5 (Loaded by default)
Quantum Espresso 6.8
/opt/ohpc/pub/apps/quantum-espresso/6.8
  • module load quantum-espresso/6.8

Help

  • Submit questions or requests at help or by sending email to: help@cac.cornell.edu. Please include AIDA in the subject area.