Difference between revisions of "MARVIN Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
This is a private cluster.
 
This is a private cluster.
  
==Hardware==
+
=Hardware=
 
:* Head node:  marvin.cac.cornell.edu.
 
:* Head node:  marvin.cac.cornell.edu.
 
:* access modes: ssh
 
:* access modes: ssh
Line 8: Line 8:
 
:* 86 compute nodes with Dual 6-core X5670 CPUs @ 3 GHz, Hyperthreaded, 24 GB of RAM; 3 high memory nodes with 96 GB of RAM
 
:* 86 compute nodes with Dual 6-core X5670 CPUs @ 3 GHz, Hyperthreaded, 24 GB of RAM; 3 high memory nodes with 96 GB of RAM
 
:* Cluster Status: [http://marvin.cac.cornell.edu/ganglia/ Ganglia].
 
:* Cluster Status: [http://marvin.cac.cornell.edu/ganglia/ Ganglia].
:* [https://www.cac.cornell.edu/wiki/index.php?title=FAQ#Why_use_a_temporary_directory "Why use a temporary directory?"]  
+
:* Submit [https://www.cac.cornell.edu/help help requests] or send email to: [mailto:help@cac.cornell.edu help@cac.cornell.edu]
:* Submit HELP requests: [https://www.cac.cornell.edu/help Help] OR by sending email to: help@cac.cornell.edu
 
  
==File Systems==
+
=File Systems=
===Home Directories===
+
==Home Directories==
 
:* Path: ~
 
:* Path: ~
  
 
User home directories are hosted on the head node and exported to the compute nodes via NFS.  Unless special arrangements are made, data in user home directories are NOT backed up.
 
User home directories are hosted on the head node and exported to the compute nodes via NFS.  Unless special arrangements are made, data in user home directories are NOT backed up.
  
===Globus Access===
+
==Globus Access==
 
User home directories can be accessed on [https://globus.org Globus]. Under "File Manager" tab in Globus web GUI:
 
User home directories can be accessed on [https://globus.org Globus]. Under "File Manager" tab in Globus web GUI:
  
Line 23: Line 22:
 
# Authenticate using your CAC user name and password if prompted.
 
# Authenticate using your CAC user name and password if prompted.
  
==Scheduler/Queues==
+
=Scheduler/Queues=
 
:* The cluster scheduler is '''Slurm'''. See [https://www.cac.cornell.edu/wiki/index.php?title=Slurm Slurm documentation] page for details.  
 
:* The cluster scheduler is '''Slurm'''. See [https://www.cac.cornell.edu/wiki/index.php?title=Slurm Slurm documentation] page for details.  
 
:* Note: hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs. See the [https://www.cac.cornell.edu/wiki/index.php?title=Slurm#Options_for_Submitting_Jobs slurm options] section for using the correct options for your job.
 
:* Note: hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs. See the [https://www.cac.cornell.edu/wiki/index.php?title=Slurm#Options_for_Submitting_Jobs slurm options] section for using the correct options for your job.
Line 45: Line 44:
 
|}
 
|}
  
==Software==
+
=Software=
 +
==Work with Environment Modules==
 +
 
 +
Set up the working environment for each package using the module command. 
 +
The module command will activate dependent modules if there are any.
 +
 
 +
To show currently loaded modules:
 +
(These modules are loaded by default system configurations)
 +
<pre>
 +
-bash-4.2$ module list
 +
 
 +
Currently Loaded Modules:
 +
  1) autotools  2) prun/1.3  3) gnu8/8.3.0  4) openmpi3/3.1.4  5) ohpc
 +
</pre>
 +
 
 +
To show all available modules (as of Sept 30, 2013):
 +
<pre>
 +
-bash-4.2$ module avail
 +
 
 +
-------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------
 +
  fftw/3.3.8    hypre/2.18.1
 +
 
 +
------------------------ /opt/ohpc/pub/moduledeps/gnu8 -------------------------
 +
  impi/2019.7.217    mpich/3.3.1      openblas/0.3.7        pdtoolkit/3.25
 +
  metis/5.1.0        mvapich2/2.3.2    openmpi3/3.1.4 (L)    superlu/5.2.1
 +
 
 +
-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
 +
  autotools          (L)    gnu8/8.3.0      (L)    prun/1.3          (L)
 +
  charliecloud/0.11        intel/2020.1.217        python/3.8.3
 +
  clustershell/1.8.2        ohpc            (L)    singularity/3.4.1
 +
  cmake/3.15.4              papi/5.7.0              valgrind/3.15.0
 +
  ensight/10.1.4a          pmix/2.2.2              visit/3.0.1
 +
 
 +
  Where:
 +
  L:  Module is loaded
 +
 
 +
Use "module spider" to find all possible modules.
 +
Use "module keyword key1 key2 ..." to search for all possible modules matching
 +
any of the "keys".
 +
</pre>
 +
To load a module and verify:
 +
<pre>
 +
-bash-4.2$ module load visit/3.0.1
 +
-bash-4.2$ module list
 +
 
 +
Currently Loaded Modules:
 +
  1) autotools  3) gnu8/8.3.0      5) ohpc
 +
  2) prun/1.3    4) openmpi3/3.1.4  6) visit/3.0.1
 +
 
 +
</pre>
 +
 
 +
== Manage Modules in Your Python Virtual Environment ==
 +
 
 +
python 3.8.3 is installed. Users can manage their own python environment (including installing needed modules) using virtual environments. Please see [https://packaging.python.org/guides/installing-using-pip-and-virtual-environments the documentation on virtual environments on python.org] for details.
 +
 
 +
=== Load python/3.8.3 module ===
 +
 
 +
First load <code>python/3.8.3</code> module to select python 3.8.3:
 +
<pre>
 +
module load python/3.8.3
 +
</pre>
 +
 
 +
=== Create Virtual Environment ===
 +
 
 +
You can '''create''' as many virtual environments, each in their own directory, as needed.
 +
 
 +
<pre>
 +
python3 -m venv <your virtual environment directory>
 +
</pre>
 +
 
 +
=== Activate Virtual Environment ===
 +
 
 +
You need to '''activate''' a virtual environment before using it:
 +
 
 +
<pre>source <your virtual environment directory>/bin/activate</pre>
 +
 
 +
=== Install Python Modules Using pip ===
 +
 
 +
After activating your virtual environment, you can now install python modules for the activated environment:
 +
 
 +
* It's always a good idea to update <code>pip</code> first:
 +
<pre>pip install --upgrade pip</pre>
 +
 
 +
* Install the module:
 +
<pre>pip install <module name></pre>
 +
 
 +
* List installed python modules in the environment:
 +
<pre>pip list modules</pre>
 +
 
 +
* Examples: Install <code>tensorflow</code> and <code>keras</code> like this:
 +
 
 +
<pre>
 +
-bash-4.2$ python3 -m venv tensorflow
 +
-bash-4.2$ source tensorflow/bin/activate
 +
(tensorflow) -bash-4.2$ pip install --upgrade pip
 +
Collecting pip
 +
  Using cached https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl
 +
Installing collected packages: pip
 +
  Found existing installation: pip 18.1
 +
    Uninstalling pip-18.1:
 +
      Successfully uninstalled pip-18.1
 +
Successfully installed pip-19.2.3
 +
(tensorflow) -bash-4.2$ pip install tensorflow keras
 +
Collecting tensorflow
 +
  Using cached https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl
 +
:
 +
:
 +
:
 +
Successfully installed absl-py-0.8.0 astor-0.8.0 gast-0.2.2 google-pasta-0.1.7 grpcio-1.23.0 h5py-2.9.0 keras-2.2.5 keras-applications-1.0.8 keras-preprocessing-1.1.0 markdown-3.1.1 numpy-1.17.1 protobuf-3.9.1 pyyaml-5.1.2 scipy-1.3.1 six-1.12.0 tensorboard-1.14.0 tensorflow-1.14.0 tensorflow-estimator-1.14.0 termcolor-1.1.0 werkzeug-0.15.5 wheel-0.33.6 wrapt-1.11.2
 +
(tensorflow) -bash-4.2$ pip list modules
 +
Package              Version
 +
-------------------- -------
 +
absl-py              0.8.0 
 +
astor                0.8.0 
 +
gast                0.2.2 
 +
google-pasta        0.1.7 
 +
grpcio              1.23.0
 +
h5py                2.9.0 
 +
Keras                2.2.5 
 +
Keras-Applications  1.0.8 
 +
Keras-Preprocessing  1.1.0 
 +
Markdown            3.1.1 
 +
numpy                1.17.1
 +
pip                  19.2.3
 +
protobuf            3.9.1 
 +
PyYAML              5.1.2 
 +
scipy                1.3.1 
 +
setuptools          40.6.2
 +
six                  1.12.0
 +
tensorboard          1.14.0
 +
tensorflow          1.14.0
 +
tensorflow-estimator 1.14.0
 +
termcolor            1.1.0 
 +
Werkzeug            0.15.5
 +
wheel                0.33.6
 +
wrapt                1.11.2
 +
</pre>
 +
 
 +
==Software List==
 
::{| border="1" cellspacing="0" cellpadding="10"
 
::{| border="1" cellspacing="0" cellpadding="10"
 
! Software
 
! Software
Line 58: Line 195:
 
| /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 or /opt/ohpc/pub/mpi/openmpi3-intel/3.1.4
 
| /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 or /opt/ohpc/pub/mpi/openmpi3-intel/3.1.4
 
| module load openmpi3/3.1.4
 
| module load openmpi3/3.1.4
 +
|-
 +
| GNU Compilers 10.2.0
 +
| /opt/ohpc/pub/compiler/gcc/10.2.0
 +
| module load gnu10/10.2.0
 +
|-
 +
| openmpi 4.0.5
 +
| /opt/ohpc/pub/mpi/openmpi4-gnu10/4.0.5 or /opt/ohpc/pub/mpi/openmpi4-gnu10/4.0.5
 +
| module load openmpi4/4.0.5
 
|-
 
|-
 
| Intel Parallel Studio XE 2020.1.217  
 
| Intel Parallel Studio XE 2020.1.217  
Line 89: Line 234:
 
| python 3.8.3  
 
| python 3.8.3  
 
| /opt/ohpc/pub/utils/python/3.8.3
 
| /opt/ohpc/pub/utils/python/3.8.3
| module load python/3.8.3
+
|  
 +
* module load python/3.8.3
 +
* See the [[#Manage_Modules_in_Your_Python_Virtual_Environment| Manage Modules in Your Python Virtual Environment]] section on installing python modules in your own environment.
 +
|-
 +
| gnuplot 5.4.0
 +
| /opt/ohpc/pub/apps/gnuplot/5.4.0
 +
|
 +
* module load gnuplot/5.4.0
 
|}
 
|}
 
==Quick Tutorial==
 
The batch system treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.
 
 
===Select your default MPI===
 
There are several versions of MPI on the Marvin cluster. Use the following commands to modify your default mpi.
 
 
:* mpi-selector --query  -> shows your default mpi
 
:* mpi-selector --list  -> shows all available mpi installations
 
:* mpi-selector --set <mpi installation>  -> sets your default mpi, note, you will have to exit and log back in for this to take effect.
 
 
===Running an MPI Job on the Whole Cluster===
 
:*assuming /opt/openmpi/ is the default, the mpiexec options may change depending on your selected MPI.
 
:*First use showq to see how many cores are available. It may be less than 1152 if a node is down.
 
 
<source lang="bash">
 
#!/bin/sh
 
#PBS -l nodes=96:ppn=12    (note, this is PBS -l (small case L))
 
#PBS -N test
 
#PBS -j oe
 
#PBS -S /bin/bash
 
 
set -x
 
cd "$PBS_O_WORKDIR"
 
 
mpiexec --hostfile $PBS_NODEFILE <executable>    (where executable is the program you wish to run)
 
</source>
 
 
===Running an MPI Job using 12 Tasks Per Node===
 
Because the nodes have 12 physical cores, you may want to limit jobs to 12 tasks per node.
 
The node file lists each node 1 time, so make a copy with each node listed 12 times, and
 
hand that version to MPI.
 
 
<source lang="bash">
 
#!/bin/sh
 
#PBS -l nodes=48
 
#PBS -N test
 
#PBS -j oe
 
#PBS -S /bin/bash
 
 
set -x
 
cd "$PBS_O_WORKDIR"
 
 
# Construct a copy of the hostfile with only 12 entries per node.
 
# MPI can use this to run 12 tasks on each node.
 
uniq "$PBS_NODEFILE"|awk '{for(i=0;i<12;i+=1) print}'>nodefile.12way
 
 
# to Run 12-way on 4 nodes, we request 48 core to obtain 4 nodes
 
mpiexec --hostfile nodefile.12way ring -v
 
</source>
 
 
 
===Running Many Copies of a Serial Job===
 
In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.
 
 
<source lang="bash">
 
#!/bin/sh
 
#PBS -l nodes=1  (note, this is PBS -l (small case L))
 
#PBS -t 30
 
#PBS -N test
 
#PBS -j oe
 
#PBS -S /bin/bash
 
 
set -x
 
cd "$PBS_O_WORKDIR"
 
echo Run my job.
 
</source>
 
 
When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.
 
 
===Running on a specific node===
 
To run on a specific node use the host= option
 
 
<source lang="bash">
 
#!/bin/sh
 
#PBS -l host=compute-3-16      (note, this is PBS -l (small case L))
 
#PBS -N test
 
#PBS -j oe
 
#PBS -S /bin/bash
 
 
 
set -x
 
cd "$PBS_O_WORKDIR"
 
echo Run my job.
 
</source>
 
===Running in the viz queue ===
 
To run in the viz queue use the -q option
 
 
<source lang="bash">
 
#!/bin/sh
 
#PBS -l nodes=1      (note, this is PBS -l (small case L))
 
#PBS -N test
 
#PBS -j oe
 
#PBS -S /bin/bash
 
#PBS -q viz
 
 
 
set -x
 
cd "$PBS_O_WORKDIR"
 
echo Run my job.
 
</source>
 
===Running an interactive job===
 
from the command line:
 
qsub -l nodes=1 -I
 

Latest revision as of 14:40, 8 December 2020

This is a private cluster.

Hardware

  • Head node: marvin.cac.cornell.edu.
  • access modes: ssh
  • OpenHPC v1.3.8 with CentOS 7.8
  • 86 compute nodes with Dual 6-core X5670 CPUs @ 3 GHz, Hyperthreaded, 24 GB of RAM; 3 high memory nodes with 96 GB of RAM
  • Cluster Status: Ganglia.
  • Submit help requests or send email to: help@cac.cornell.edu

File Systems

Home Directories

  • Path: ~

User home directories are hosted on the head node and exported to the compute nodes via NFS. Unless special arrangements are made, data in user home directories are NOT backed up.

Globus Access

User home directories can be accessed on Globus. Under "File Manager" tab in Globus web GUI:

  1. Access "cac#marvin" endpoint.
  2. Authenticate using your CAC user name and password if prompted.

Scheduler/Queues

  • The cluster scheduler is Slurm. See Slurm documentation page for details.
  • Note: hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs. See the slurm options section for using the correct options for your job.
  • Partitions:
Name Description Time Limit
viz 3 visualization Ensight Servers, each has 96GB RAM none
normal (default) all nodes except for those in viz queue none
all all cluster nodes none

Software

Work with Environment Modules

Set up the working environment for each package using the module command. The module command will activate dependent modules if there are any.

To show currently loaded modules: (These modules are loaded by default system configurations)

-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   2) prun/1.3   3) gnu8/8.3.0   4) openmpi3/3.1.4   5) ohpc

To show all available modules (as of Sept 30, 2013):

-bash-4.2$ module avail

-------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------
   fftw/3.3.8    hypre/2.18.1

------------------------ /opt/ohpc/pub/moduledeps/gnu8 -------------------------
   impi/2019.7.217    mpich/3.3.1       openblas/0.3.7        pdtoolkit/3.25
   metis/5.1.0        mvapich2/2.3.2    openmpi3/3.1.4 (L)    superlu/5.2.1

-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
   autotools          (L)    gnu8/8.3.0       (L)    prun/1.3          (L)
   charliecloud/0.11         intel/2020.1.217        python/3.8.3
   clustershell/1.8.2        ohpc             (L)    singularity/3.4.1
   cmake/3.15.4              papi/5.7.0              valgrind/3.15.0
   ensight/10.1.4a           pmix/2.2.2              visit/3.0.1

  Where:
   L:  Module is loaded

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".

To load a module and verify:

-bash-4.2$ module load visit/3.0.1
-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   3) gnu8/8.3.0       5) ohpc
  2) prun/1.3    4) openmpi3/3.1.4   6) visit/3.0.1

Manage Modules in Your Python Virtual Environment

python 3.8.3 is installed. Users can manage their own python environment (including installing needed modules) using virtual environments. Please see the documentation on virtual environments on python.org for details.

Load python/3.8.3 module

First load python/3.8.3 module to select python 3.8.3:

module load python/3.8.3

Create Virtual Environment

You can create as many virtual environments, each in their own directory, as needed.

python3 -m venv <your virtual environment directory>

Activate Virtual Environment

You need to activate a virtual environment before using it:

source <your virtual environment directory>/bin/activate

Install Python Modules Using pip

After activating your virtual environment, you can now install python modules for the activated environment:

  • It's always a good idea to update pip first:
pip install --upgrade pip
  • Install the module:
pip install <module name>
  • List installed python modules in the environment:
pip list modules
  • Examples: Install tensorflow and keras like this:
-bash-4.2$ python3 -m venv tensorflow
-bash-4.2$ source tensorflow/bin/activate
(tensorflow) -bash-4.2$ pip install --upgrade pip
Collecting pip
  Using cached https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 18.1
    Uninstalling pip-18.1:
      Successfully uninstalled pip-18.1
Successfully installed pip-19.2.3
(tensorflow) -bash-4.2$ pip install tensorflow keras
Collecting tensorflow
  Using cached https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl
:
:
:
Successfully installed absl-py-0.8.0 astor-0.8.0 gast-0.2.2 google-pasta-0.1.7 grpcio-1.23.0 h5py-2.9.0 keras-2.2.5 keras-applications-1.0.8 keras-preprocessing-1.1.0 markdown-3.1.1 numpy-1.17.1 protobuf-3.9.1 pyyaml-5.1.2 scipy-1.3.1 six-1.12.0 tensorboard-1.14.0 tensorflow-1.14.0 tensorflow-estimator-1.14.0 termcolor-1.1.0 werkzeug-0.15.5 wheel-0.33.6 wrapt-1.11.2
(tensorflow) -bash-4.2$ pip list modules
Package              Version
-------------------- -------
absl-py              0.8.0  
astor                0.8.0  
gast                 0.2.2  
google-pasta         0.1.7  
grpcio               1.23.0 
h5py                 2.9.0  
Keras                2.2.5  
Keras-Applications   1.0.8  
Keras-Preprocessing  1.1.0  
Markdown             3.1.1  
numpy                1.17.1 
pip                  19.2.3 
protobuf             3.9.1  
PyYAML               5.1.2  
scipy                1.3.1  
setuptools           40.6.2 
six                  1.12.0 
tensorboard          1.14.0 
tensorflow           1.14.0 
tensorflow-estimator 1.14.0 
termcolor            1.1.0  
Werkzeug             0.15.5 
wheel                0.33.6 
wrapt                1.11.2 

Software List

Software Path Notes
*GNU Compilers 8.3.0 /opt/ohpc/pub/compiler/gcc/8.3.0 module load gnu8/8.3.0
*openmpi 3.1.4 /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 or /opt/ohpc/pub/mpi/openmpi3-intel/3.1.4 module load openmpi3/3.1.4
GNU Compilers 10.2.0 /opt/ohpc/pub/compiler/gcc/10.2.0 module load gnu10/10.2.0
openmpi 4.0.5 /opt/ohpc/pub/mpi/openmpi4-gnu10/4.0.5 or /opt/ohpc/pub/mpi/openmpi4-gnu10/4.0.5 module load openmpi4/4.0.5
Intel Parallel Studio XE 2020.1.217 /opt/ohpc/pub/compiler/intel/2020/ module swap gnu8 intel/20.1.2017
Intel MPI 2020.1.217 /opt/ohpc/pub/compiler/intel/2020/compilers_and_libraries_2020.1.217/linux/mpi module load impi/2020.1.217
mvapich2 2.3.2 /opt/ohpc/pub/mpi/mvapich2-gnu/2.3.2 or /opt/ohpc/pub/mpi/mvapich2-intel/2.3.2 module load mvapich2/2.3.2
fftw 3.3.8 /opt/ohpc/pub/libs/gnu8/openmpi3/fftw/3.3.8 or /opt/ohpc/pub/libs/gnu8/mvapich2/fftw/3.3.8 module load fftw/3.3.8
hypre 2.18.1 /opt/ohpc/pub/libs/gnu8/openmpi3/hypre/2.18.1, /opt/ohpc/pub/libs/gnu8/impi/hypre/2.18.1, /opt/ohpc/pub/libs/intel/openmpi3/hypre/2.18.1, or /opt/ohpc/pub/libs/intel/impi/hypre/2.18.1 module load hypre/2.18.1
ensight 10.1.4a /opt/ohpc/pub/apps/ensight/10.1.4a module load ensight/10.1.4a
VisIt 3.0.1 /opt/ohpc/pub/apps/visit/3.0.1/bin module load visit/3.0.1
python 3.8.3 /opt/ohpc/pub/utils/python/3.8.3
gnuplot 5.4.0 /opt/ohpc/pub/apps/gnuplot/5.4.0
  • module load gnuplot/5.4.0