Difference between revisions of "THECUBE Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 7: Line 7:
 
:* 32 compute nodes with Dual 8-core E5-2680 CPUs @ 2.7 GHz, 128 GB of RAM
 
:* 32 compute nodes with Dual 8-core E5-2680 CPUs @ 2.7 GHz, 128 GB of RAM
 
:* THECUBE Cluster Status: [http://thecube.cac.cornell.edu/ganglia/ Ganglia].
 
:* THECUBE Cluster Status: [http://thecube.cac.cornell.edu/ganglia/ Ganglia].
:* Submit HELP requests: [http://www.cac.cornell.edu/help help] OR by sending an email to [mailto:help@cac.cornell.edu CAC support] please include THECUBE in the subject area.
+
:* Submit HELP requests: [https://{{SERVERNAME}}/help help] OR by sending an email to [mailto:help@cac.cornell.edu CAC support] please include THECUBE in the subject area.
  
 
==File Systems==
 
==File Systems==
Line 262: Line 262:
 
==HELP==
 
==HELP==
 
:* THECUBE Cluster Status: [http://thecube.cac.cornell.edu/ganglia/ Ganglia].
 
:* THECUBE Cluster Status: [http://thecube.cac.cornell.edu/ganglia/ Ganglia].
:* Submit HELP requests: [http://www.cac.cornell.edu/help help] OR by sending email to: help@cac.cornell.edu, please include THECUBE in the subject area.
+
:* Submit HELP requests: [https://{{SERVERNAME}}/help help] OR by sending email to: help@cac.cornell.edu, please include THECUBE in the subject area.

Revision as of 11:01, 30 September 2015

This is a private cluster.

Hardware

  • Head node: thecube.cac.cornell.edu.
  • access modes: ssh
  • Rocks 6.1 with CentOS 6.3
  • 32 compute nodes with Dual 8-core E5-2680 CPUs @ 2.7 GHz, 128 GB of RAM
  • THECUBE Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending an email to CAC support please include THECUBE in the subject area.

File Systems

Home Directories

  • Path: ~

User home directories is located on a NFS export from the head node. Use your home directory (~) for archiving the data you wish to keep. Do NOT use this file system for computation as bandwidth to the compute nodes is very limited and will quickly be overwhelmed by file I/Os from large jobs.

Unless special arrangements are made, data in user's home directories are NOT backed up.

Scratch File System

LUSTRE file system provided by Terascala and Dell

  • Path: /scratch/<user name>

The scratch file system is a fast parallel file system. Use this file system for scratch space for your jobs. Copy the results you want to keep back to your home directory for safe keeping.

Scheduler/Queues

  • Maui/Torque scheduler
  • Queues:
Name Description Time Limit
default all nodes no limit

Software

Set up the working environment for each package using the module command. The module command will activate dependent modules if there are any.

To show currently loaded modules:

-sh-4.1$ module list
Currently Loaded Modulefiles:
  1) openmpi-1.6.5-intel-x86_64

To show all available modules (as of Sept 30, 2013):

-sh-4.1$ module avail

----------------------------------- /usr/share/Modules/modulefiles -----------------------------------
dot              module-info      null             rocks-openmpi_ib
module-cvs       modules          rocks-openmpi    use.own

------------------------------------------ /etc/modulefiles ------------------------------------------
boost-1.54.0               mathematica-9.0            sas-9.3
cmake-2.8.11.2             matlab-r2013a              valgrind-3.8.1
eclipse-4.3                netcdf-4.3.0               visit-2.5.2
hdf5-1.8.11                openmpi-1.6.5-intel-x86_64 zlib-1.2.8

To load a module and verify:

-sh-4.1$ module load mathematica-9.0
-sh-4.1$ module list
Currently Loaded Modulefiles:
  1) openmpi-1.6.5-intel-x86_64   2) mathematica-9.0

To unload a module and verify:

-sh-4.1$ module unload mathematica-9.0
-sh-4.1$ module list
Currently Loaded Modulefiles:
  1) openmpi-1.6.5-intel-x86_64 

SOFTWARE LIST

Software Path Notes
Intel Compilers (including MKL) /opt/intel
  • Included in user's default path.
Openmpi 1.6.5 /opt/openmpi
  • Included in user's default path.
Mathematica /opt/Mathematica
  • module load mathematica-9.0
Matlab /opt/MATLAB
  • module load matlab-r2013a
SAS /opt/SAS
  • module load sas-9.3
Boost /opt/boost
  • module load boost-1.54.0
cmake /opt/cmake
  • module load cmake-2.8.11.2
eclipse /opt/eclipse
  • module load eclipse-4.3
hdf5 /opt/hdf5
  • module load hdf5-1.8.11
netcdf /opt/netcdf
  • module load netcdf-4.3.0
valgrind /opt/valgrind
  • module load valgrind-3.8.1
visit /opt/visit
  • module load visit-2.5.2
zlib /opt/zlib
  • module load zlib-1.2.8
acml /opt/acml
  • AMD Core Math Library
  • no module file
  • not in default path
R, ffmpeg /usr/bin
  • in default path
BLAS, LAPACK libraries
  • in default path
Thrust
  • Coming soon

Quick Tutorial

The batch system treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.

Running an MPI Job on the Whole Cluster

  • We are assuming /opt/openmpi/ is the default, which it is on thecube cluster. The mpiexec options may change depending on your selected MPI.
  • First use showq to see how many cores are available. It may be less than 512 if a node is down.
-sh-4.1$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME


     0 Active Jobs       0 of  512 Processors Active (0.00%)
                         0 of   32 Nodes Active      (0.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 0   Active Jobs: 0   Idle Jobs: 0   Blocked Jobs: 0
  • Next create a script (using your favorite editor ex. vim) named runmyfile.sh that contains the following lines of code:
#!/bin/sh
#PBS -l nodes=32:ppn=16    (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

mpiexec --hostfile $PBS_NODEFILE <executable>     (substitute executable for the program you wish to run)
  • Submit the job to the cluster
-sh-4.1$qsub runmyfile.sh
  • Look for the output file in a file named test.

Running an MPI Job using 16 Tasks Per Node

Because the nodes have 16 physical cores, you may want to limit jobs to 16 tasks per node. The node file lists each node 1 time, so make a copy with each node listed 16 times, and hand that version to MPI.

#!/bin/sh
#PBS -l nodes=64
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 16 entries per node.
# MPI can use this to run 16 tasks on each node.
uniq "$PBS_NODEFILE"|awk '{for(i=0;i<16;i+=1) print}'>nodefile.16way

# to Run 16-way on 4 nodes, we request 64 core to obtain 4 nodes
mpiexec --hostfile nodefile.16way ring -v

Running Many Copies of a Serial Job

In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.

#!/bin/sh
#PBS -l nodes=1  (note, this is PBS -l (small case L))
#PBS -t 30
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.

Running on a specific node

To run on a specific node use the host= option

#!/bin/sh
#PBS -l host=compute-1-16       (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash


set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

Running an interactive job

from the command line: qsub -l nodes=1 -I

HELP

  • THECUBE Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending email to: help@cac.cornell.edu, please include THECUBE in the subject area.