Difference between revisions of "MARVIN Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 2: Line 2:
 
This is a private cluster.
 
This is a private cluster.
  
=Hardware=
+
==Hardware==
 
:* Head node:  marvin.cac.cornell.edu.
 
:* Head node:  marvin.cac.cornell.edu.
 
:* access modes: ssh
 
:* access modes: ssh
Line 10: Line 10:
 
:* Submit HELP requests: [http://www.cac.cornell.edu/help help] OR by sending email to: help@cac.cornell.edu
 
:* Submit HELP requests: [http://www.cac.cornell.edu/help help] OR by sending email to: help@cac.cornell.edu
  
=File Systems=
+
==File Systems==
==Home Directories==
+
===Home Directories===
 
:* Path: ~
 
:* Path: ~
  
Line 18: Line 18:
 
Unless special arrangements are made, data in user home directories are NOT backed up.
 
Unless special arrangements are made, data in user home directories are NOT backed up.
  
==Scratch File System==
+
===Scratch File System===
 
LUSTRE file system provided by Terascala and Dell
 
LUSTRE file system provided by Terascala and Dell
 
Path: /scratch/<user name>
 
Path: /scratch/<user name>
Line 24: Line 24:
 
The scratch file system is a fast parallel file system.  Use this file system for scratch space for your jobs.  Copy the results you want to keep back to your home directory for safe keeping.
 
The scratch file system is a fast parallel file system.  Use this file system for scratch space for your jobs.  Copy the results you want to keep back to your home directory for safe keeping.
  
=Scheduler/Queues=
+
==Scheduler/Queues==
 
:* Maui/Torque scheduler;  
 
:* Maui/Torque scheduler;  
 
:* Queues:
 
:* Queues:
Line 49: Line 49:
 
|}
 
|}
 
   
 
   
=Software=
+
==Software==
 
{| border="1" cellspacing="0" cellpadding="10"
 
{| border="1" cellspacing="0" cellpadding="10"
 
! Software
 
! Software
Line 141: Line 141:
 
|}
 
|}
  
=Quick Tutorial=
+
==Quick Tutorial==
 
The batch system treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.
 
The batch system treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.
  
==Select your default MPI==
+
===Select your default MPI===
 
There are several versions of MPI on the Marvin cluster. Use the following commands to modify your default mpi.
 
There are several versions of MPI on the Marvin cluster. Use the following commands to modify your default mpi.
  
Line 151: Line 151:
 
:* mpi-selector --set <mpi installation>  -> sets your default mpi, note, you will have to exit and log back in for this to take effect.
 
:* mpi-selector --set <mpi installation>  -> sets your default mpi, note, you will have to exit and log back in for this to take effect.
  
==Running an MPI Job on the Whole Cluster==
+
===Running an MPI Job on the Whole Cluster===
 
:*assuming /opt/openmpi/ is the default, the mpiexec options may change depending on your selected MPI.
 
:*assuming /opt/openmpi/ is the default, the mpiexec options may change depending on your selected MPI.
 
:*First use showq to see how many cores are available. It may be less than 1152 if a node is down.
 
:*First use showq to see how many cores are available. It may be less than 1152 if a node is down.
Line 168: Line 168:
 
</source>
 
</source>
  
==Running an MPI Job using 12 Tasks Per Node==
+
===Running an MPI Job using 12 Tasks Per Node===
 
Because the nodes have 12 physical cores, you may want to limit jobs to 12 tasks per node.
 
Because the nodes have 12 physical cores, you may want to limit jobs to 12 tasks per node.
 
The node file lists each node 1 time, so make a copy with each node listed 12 times, and
 
The node file lists each node 1 time, so make a copy with each node listed 12 times, and
Line 192: Line 192:
  
  
==Running Many Copies of a Serial Job==
+
===Running Many Copies of a Serial Job===
 
In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.
 
In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.
  
Line 210: Line 210:
 
When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.
 
When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.
  
==Running on a specific node==
+
===Running on a specific node===
 
To run on a specific node use the host= option  
 
To run on a specific node use the host= option  
  
Line 225: Line 225:
 
echo Run my job.
 
echo Run my job.
 
</source>
 
</source>
==Running in the viz queue ==
+
===Running in the viz queue ===
 
To run in the viz queue use the -q option
 
To run in the viz queue use the -q option
  
Line 241: Line 241:
 
echo Run my job.
 
echo Run my job.
 
</source>
 
</source>
==Running an interactive job==
+
===Running an interactive job===
 
from the command line:
 
from the command line:
 
qsub -l nodes=1 -I
 
qsub -l nodes=1 -I

Revision as of 12:02, 21 September 2015

This is a private cluster.

Hardware

  • Head node: marvin.cac.cornell.edu.
  • access modes: ssh
  • Rocks 5.4.3 with CentOS 5.6
  • 92 compute nodes with Dual 6-core X5670 CPUs @ 3 GHz, Hyperthreaded, 48 GB of RAM; 4 high memory nodes with 96 GB of RAM
  • Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending email to: help@cac.cornell.edu

File Systems

Home Directories

  • Path: ~

User home directories is located on a NFS export from the head node. Use your home directory (~) for archiving the data you wish to keep. Do NOT use this file system for computation as bandwidth to the compute nodes is very limited and will quickly be overwhelmed by file I/Os from large jobs.

Unless special arrangements are made, data in user home directories are NOT backed up.

Scratch File System

LUSTRE file system provided by Terascala and Dell Path: /scratch/<user name>

The scratch file system is a fast parallel file system. Use this file system for scratch space for your jobs. Copy the results you want to keep back to your home directory for safe keeping.

Scheduler/Queues

  • Maui/Torque scheduler;
  • Queues:
Name Description Time Limit
viz 4 visualization Ensight Servers, each has 96GB RAM 24 hours
default all nodes except for those in viz queue 24 hours
long all nodes except for those in viz queue 72 hours
all all nodes none

Software

Software Path Notes
Intel Cluster Studio /opt/intel
  • 30-day trial license.
  • Intel compilers in user default path.
  • Use mpi-selector to select Intel MPI
gcc 4.6.2 /opt/gcc/4.6.2 Prepend /opt/gcc/4.6.2/bin to $PATH to use this gcc version
openmpi 1.6.3 (gnu) /opt/openmpi/gnu/1.6.3
  • Compiled by gcc 4.6.2
  • To select this MPI implementation, use "mpi-selector --set openmpi-1.6.3-gcc-4.6.2" command. Log out and log back in. This will set gcc 4.6.2 as the default compiler as well.
openmpi 1.6.3 (Intel) /opt/openmpi/intel/1.6.3
  • Compiled by Intel 12.1
  • To select this MPI implementation, use "mpi-selector --set openmpi-1.6.3-intel" command. Log out and log back in.
openmpi 1.4.4 (gnu) /opt/openmpi/gnu/1.4.4
  • Compiled by gcc 4.6.2
  • To select this MPI implementation, use "mpi-selector --set openmpi-1.4.4-gcc-4.6.2" command. Log out and log back in. This will set gcc 4.6.2 as the default compiler as well.
openmpi 1.4.4 (Intel) /opt/openmpi/intel/1.4.4
  • Compiled by Intel 12.1
  • To select this MPI implementation, use "mpi-selector --set openmpi-1.4.4-intel" command. Log out and log back in.
mvapich 1.2 (gnu) /opt/mvapich/gnu/1.2
  • Compiled by gcc 4.6.2
  • To select this MPI implementation, use "mpi-selector --set mvapich-1.2-gcc-4.6.2" command. Log out and log back in. This will set gcc 4.6.2 as the default compiler as well.
mvapich 1.2 (Intel) /opt/mvapich/intel/1.2
  • DO NOT USE -- So far, the Intel 12.1 compiler has failed to produce a working build of mvapich.
  • (in the future) To select this MPI implementation, use "mpi-selector --set mvapich-1.2-intel" command. Log out and log back in.
Intel MPI /opt/intel/impi/3.1
  • To select this MPI implementation, use "mpi-selector --set intel-4.0.3" command. Log out and log back in.
fftw 3.3 (gnu) /opt/fftw/gnu/3.3
  • Compiled by gcc 4.6.2
  • With Intel compilers, Use MKL (Intel Math Kernel Library) in /opt/intel/mkl.
lapack 3.4.0 (gnu) /opt/lapack/gnu/3.4.0
  • Compiled by gcc 4.6.2
  • With Intel compilers, Use MKL (Intel Math Kernel Library) in /opt/intel/mkl.
hypre 2.0.0 (gnu) /opt/hypre/gnu/2.0.0 Compiled by gcc 4.1.2
hypre 2.6.0b (gnu) /opt/hypre/gnu/2.6.0b Compiled by gcc 4.6.2
hypre 2.6.0b (Intel) /opt/hypre/intel/2.6.0b Compiled by Intel Compilers 12.1
ensight 9.2 /usr/local/CEI
  • Installed only on the head node and "viz" queue (compute-3-13 to compute-3-16)
ensight 10.0 /usr/local/CEI
  • Installed only on the head node and "viz" queue (compute-3-13 to compute-3-16)
VisIt 2.9.2 /opt/visit
  • Must default to use the following OpenMPI version for parallel visualization like this:
-bash-3.2$ mpi-selector --set openmpi-1.6.3-gcc-4.6.2
Defaults already exist; overwrite them? (y/N) y
-bash-3.2$ mpi-selector --query
default:openmpi-1.6.3-gcc-4.6.2
level:user 
  • Log out and log back in for the change to take effect.
Anaconda Python /opt/anaconda-python
  • Add the following line to your ~/.bashrc to use anaconda python:
export PATH="/opt/anaconda-python/bin:$PATH"

Quick Tutorial

The batch system treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.

Select your default MPI

There are several versions of MPI on the Marvin cluster. Use the following commands to modify your default mpi.

  • mpi-selector --query -> shows your default mpi
  • mpi-selector --list -> shows all available mpi installations
  • mpi-selector --set <mpi installation> -> sets your default mpi, note, you will have to exit and log back in for this to take effect.

Running an MPI Job on the Whole Cluster

  • assuming /opt/openmpi/ is the default, the mpiexec options may change depending on your selected MPI.
  • First use showq to see how many cores are available. It may be less than 1152 if a node is down.
#!/bin/sh
#PBS -l nodes=96:ppn=12    (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

mpiexec --hostfile $PBS_NODEFILE <executable>     (where executable is the program you wish to run)

Running an MPI Job using 12 Tasks Per Node

Because the nodes have 12 physical cores, you may want to limit jobs to 12 tasks per node. The node file lists each node 1 time, so make a copy with each node listed 12 times, and hand that version to MPI.

#!/bin/sh
#PBS -l nodes=48
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 12 entries per node.
# MPI can use this to run 12 tasks on each node.
uniq "$PBS_NODEFILE"|awk '{for(i=0;i<12;i+=1) print}'>nodefile.12way

# to Run 12-way on 4 nodes, we request 48 core to obtain 4 nodes
mpiexec --hostfile nodefile.12way ring -v


Running Many Copies of a Serial Job

In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.

#!/bin/sh
#PBS -l nodes=1  (note, this is PBS -l (small case L))
#PBS -t 30
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.

Running on a specific node

To run on a specific node use the host= option

#!/bin/sh
#PBS -l host=compute-3-16       (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash


set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

Running in the viz queue

To run in the viz queue use the -q option

#!/bin/sh
#PBS -l nodes=1       (note, this is PBS -l (small case L))
#PBS -N test
#PBS -j oe
#PBS -S /bin/bash
#PBS -q viz


set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

Running an interactive job

from the command line: qsub -l nodes=1 -I