Difference between revisions of "THECUBE Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 172: Line 172:
 
*# Activate your python virtual environment.
 
*# Activate your python virtual environment.
 
*# <pre>python /opt/ohpc/pub/apps/gurobi/8.1.1/setup.py install</pre>
 
*# <pre>python /opt/ohpc/pub/apps/gurobi/8.1.1/setup.py install</pre>
*# Now you can import the <code>gurobipy</code>
+
*# Now you can import the <code>gurobipy</code> class in your python code
 
|}
 
|}
  

Revision as of 17:07, 30 August 2019

This is a private cluster.

Hardware

  • Head node: thecube.cac.cornell.edu.
  • access modes: ssh
  • OpenHPC v1.3.8 with CentOS 7.6
  • 32 compute nodes with Dual 8-core E5-2680 CPUs @ 2.7 GHz, 128 GB of RAM
  • THECUBE Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending an email to CAC support please include THECUBE in the subject area.

File Systems

Home Directories

  • Path: ~

User home directories is located on a NFS export from the head node. Use your home directory (~) for archiving the data you wish to keep. Do NOT use this file system for computation as bandwidth to the compute nodes is very limited and will quickly be overwhelmed by file I/Os from large jobs.

Unless special arrangements are made, data in user's home directories are NOT backed up.

Scratch File System

LUSTRE file system runs Intel Lustre 2.7:

  • Path: /scratch/<user name>

The scratch file system is a fast parallel file system. Use this file system for scratch space for your jobs. Copy the results you want to keep back to your home directory for safe keeping.

Scheduler/Queues

  • Slurm scheduler
  • Queues:
Name Description Time Limit
default all nodes no limit

Software

Working with Environment Modules

Set up the working environment for each package using the module command. The module command will activate dependent modules if there are any.

To show currently loaded modules: (These modules are loaded by default system configurations)

-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   2) prun/1.3   3) gnu8/8.3.0   4) openmpi3/3.1.4   5) ohpc

To show all available modules (as of Sept 30, 2013):

-bash-4.2$ module avail

-------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------
   boost/1.70.0    netcdf/4.6.3    pnetcdf/1.11.1
   fftw/3.3.8      phdf5/1.10.5    py3-scipy/1.2.1

------------------------ /opt/ohpc/pub/moduledeps/gnu8 -------------------------
   R/3.5.3        mpich/3.3.1       openblas/0.3.5        py3-numpy/1.15.3
   hdf5/1.10.5    mvapich2/2.3.1    openmpi3/3.1.4 (L)

-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
   autotools          (L)    intel/19.0.2.187        prun/1.3        (L)
   clustershell/1.8.1        julia/1.2.0             valgrind/3.15.0
   cmake/3.14.3              octave/5.1.0            vim/8.1
   gnu8/8.3.0         (L)    ohpc             (L)    visit/3.0.1
   gurobi/8.1.1              pmix/2.2.2

  Where:
   L:  Module is loaded

To load a module and verify:

-bash-4.2$ module load R/3.5.3 
-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   3) gnu8/8.3.0       5) ohpc             7) R/3.5.3
  2) prun/1.3    4) openmpi3/3.1.4   6) openblas/0.3.5

To unload a module and verify:

-bash-4.2$ module list

Currently Loaded Modules:
  1) autotools   2) prun/1.3   3) gnu8/8.3.0   4) openmpi3/3.1.4   5) ohpc

Managing Modules in Your Python Virtual Environment

Software List

Software Path Notes
Intel Compilers
(including MKL
but NOT impi compilers)
/opt/ohpc/pub/compiler/intel/2019/
  • module load intel/19.0.2.187
  • IMPI compilers (mpiicc, etc) are not included
  • IMPI runtimes are included.
gcc 8.3
/opt/ohpc/pub/compiler/gcc/8.3.0/
  • module load gnu8/8.3.0 (Loaded by default)
Openmpi 3.1.4
/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4
  • module load openmpi3/3.1.4 (Loaded by default)
Boost 1.70.0
/opt/ohpc/pub/libs/gnu8/openmpi3/boost/1.70.0
  • module load boost/1.70.0
cmake 3.14.3
/opt/ohpc/pub/utils/cmake/3.14.3
  • module load cmake/3.14.3
hdf5 1.10.5
/opt/ohpc/pub/libs/gnu8/hdf5/1.10.5
  • module load hdf5/1.10.5
octave 5.1.0
/opt/ohpc/pub/apps/octave/5.1.0
  • module load octave/5.1.0
netcdf 4.6.3
/opt/ohpc/pub/libs/gnu8/openmpi3/netcdf
  • module load netcdf/4.6.3
valgrind 3.15.0
/opt/ohpc/pub/utils/valgrind/3.15.0
  • module load valgrind/3.15.0
visit 3.0.1
/opt/ohpc/pub/apps/visit/3.0.1
  • module load visit/3.0.1
R 3.5.3
/opt/ohpc/pub/libs/gnu8/R/3.5.3
  • module load R/3.5.3
openblas 0.3.5
/opt/ohpc/pub/libs/gnu8/openblas/0.3.5
  • module load openblas/0.3.5
vim 8.1
/opt/ohpc/pub/apps/vim/8.1
  • module load vim/8.1
julia 1.2.0
/opt/ohpc/pub/compiler/julia/1.2.0
  • module load julia/1.2.0
gurobi 8.1.1
/opt/ohpc/pub/apps/gurobi/8.1.1
  • module load gurobi/8.1.1
  • Create a ~/gurobi.lic file with the following line:
TOKENSERVER=infrastructure2.tc.cornell.edu
  • To use gurobi in your python code:
    1. module load gurobi/8.1.1
    2. Activate your python virtual environment.
    3. python /opt/ohpc/pub/apps/gurobi/8.1.1/setup.py install
    4. Now you can import the gurobipy class in your python code

Quick Tutorial

The batch system treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.

Running an MPI Job on the Whole Cluster

  • We are assuming /opt/openmpi/ is the default, which it is on thecube cluster. The mpiexec options may change depending on your selected MPI.
  • First use showq to see how many cores are available. It may be less than 512 if a node is down.
-sh-4.1$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME


     0 Active Jobs       0 of  512 Processors Active (0.00%)
                         0 of   32 Nodes Active      (0.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 0   Active Jobs: 0   Idle Jobs: 0   Blocked Jobs: 0
  • Next create a script (using your favorite editor ex. vim) named runmyfile.sh that contains the following lines of code:
#!/bin/sh
#PBS -l nodes=32:ppn=16    (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

mpiexec --hostfile $PBS_NODEFILE <executable>     (substitute executable for the program you wish to run)
  • Submit the job to the cluster
-sh-4.1$qsub runmyfile.sh
  • Look for the output file in a file named test.

Running an MPI Job using 16 Tasks Per Node

Because the nodes have 16 physical cores, you may want to limit jobs to 16 tasks per node. The node file lists each node 1 time, so make a copy with each node listed 16 times, and hand that version to MPI.

#!/bin/sh
#PBS -l nodes=4:ppn=16
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 16 entries per node.
# MPI can use this to run 16 tasks on each node.
uniq "$PBS_NODEFILE"|awk '{for(i=0;i<16;i+=1) print}'>nodefile.16way

# to Run 16-way on 4 nodes, we request 64 core to obtain 4 nodes
mpiexec --hostfile nodefile.16way ring -v

Running Many Copies of a Serial Job

In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.

#!/bin/sh
#PBS -l nodes=1  (note, this is PBS -l (small case L))
#PBS -t 30
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.

Running on a specific node

To run on a specific node use the host= option

#!/bin/sh
#PBS -l host=compute-1-16       (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash


set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

Running an interactive job

from the command line: qsub -l nodes=1 -I

Running a Hybrid MPI/OpenMP Job

Suppose you wanted to run a simple "Hello World" program (based on this one) called "hello.c" located in your home directory.

First compile the code: mpicc -fopenmp hello.c -o hello. Next, set up your job script with the number of nodes and processes you want. The following script will give exclusive access to 2 nodes because it specifies 16 ppn, which means it will use all 16 cores on each node. It also specifies using 8 OpenMP threads, 2 processes (or tasks) per node, and 4 total MPI processes. You can vary these numbers for your purposes.

#!/bin/sh
#PBS -l nodes=2:ppn=16
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

export OMP_NUM_THREADS=8

# Construct a copy of the hostfile with only 2 entries per node.
# MPI can use this to run 2 tasks on each node.
export TASKS_PER_NODE=2
uniq "$PBS_NODEFILE"|awk -v TASKS_PER_NODE="$TASKS_PER_NODE" '{for(i=0;i<TASKS_PER_NODE;i+=1) print}' > nodefile

cat nodefile

# to Run 16-way on 4 nodes, we request 64 core to obtain 4 nodes
mpiexec --hostfile nodefile -np 4 -x OMP_NUM_THREADS hello

HELP

  • THECUBE Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending email to: help@cac.cornell.edu, please include THECUBE in the subject area.