DSS Cluster
From Cornell CAC Documentation
Contents |
DSS Cluster
This is a private cluster.
- Head node dss.cac.cornell.edu.
- Rocks with CentOS Linux
- 9 compute nodes with four processors, 8GB Memory, 40GB /tmp.
- Cluster Status: Ganglia.
- Submit HELP requests: help OR by sending email to: help@cac.cornell.edu
Scheduler/Queues:
- Maui/Torque scheduler; no project ID required.
Queues:
- default: contains compute-1-[1-9]
The scheduler on this system is similar to that of V4's scheduler with a few distinct changes:
- There is no need to specify an account number in batch scripts.
- The command to submit jobs is qsub, not nsub.
Caution: due to the age of the current hardware, the OS and scheduler cannot be updated.
Therefore, other V4 documentation may not apply to this cluster.
[Ordinarily, V4's scheduler information pages would provide useful command syntax, examples, etc.]
Quick Tutorial
The batch system on dss treats each node as a node, provided ppn (i.e., processes per node) is greater than 1.
If ppn=1 or is absent, scheduling is done by core. Thus, nodes=8:ppn=1 actually puts 4 tasks on each of 2 nodes!
This can make it a bit tricky to get jobs to run on the intended cores. In general, you must use a hostfile.
If you don't specify a hostfile, all processes will run on "node 0" (that is, the first node assigned to your job).
Running a Job on the Whole Cluster
First use showq to see how many cores are available. It may be less than 36 (9x4) if a node is down.
Because the nodes have 4 physical cores, it is often best to have your job run exactly 4 tasks per node.
The node file constructed by Maui/Torque lists each node 4 times, so you can just hand that file to MPI.
Your job doesn't actually have to use MPI at all; the script just uses mpiexec as a process launcher.
#!/bin/sh #PBS -l nodes=9:ppn=4,walltime=10:00 #PBS -N test #PBS -j oe #PBS -q default #PBS -S /bin/bash set -x cd "$PBS_O_WORKDIR" # Run 4-way on 9 nodes mpiexec --hostfile $PBS_NODEFILE hostname -v
Running a Job Using 1 Task Per Node
Again, first use showq to see how many cores are available. It may be less than 36 (9x4) if a node is down.
#!/bin/sh #PBS -l nodes=9:ppn=4,walltime=10:00 #PBS -N test #PBS -j oe #PBS -q default #PBS -S /bin/bash set -x cd "$PBS_O_WORKDIR" # Construct a copy of the hostfile with only 1 entry per node. # MPI can use this to run 1 task on each node. uniq "$PBS_NODEFILE"|awk '{for(i=0;i<1;i+=1) print}'>nodefile.1way ntasks=$(wc -l < nodefile.1way) mpiexec -np $ntasks --hostfile nodefile.1way hostname -v
In this script, the for-loop can be modified to run 2 tasks per node, etc.