Difference between revisions of "LIPID Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
(Created page with "==LIPID Cluster== LIPID is a private cluster with restricted access to the gwf3_0001 group. * Head node: lipid.cac.cornell.edu * Rocks 6.1.1 with CentOS 6.5 (upgraded 9/15/1...")
 
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==LIPID Cluster==
+
==LIPID Cluster General Information==
 
LIPID is a private cluster with restricted access to the gwf3_0001 group.
 
LIPID is a private cluster with restricted access to the gwf3_0001 group.
  
* Head node:  lipid.cac.cornell.edu
+
:* Head node:  lipid.cac.cornell.edu
* Rocks 6.1.1 with CentOS 6.5 (upgraded 9/15/14)
+
:* Rocks 6.1.1 with CentOS 6.5  
* 17 compute nodes with Dual processor, four core Intel Xeon E5620 CPUs @ 2.40GHz, Hyperthreaded, 12GB Memory, 100GB /tmp.
+
:* 17 compute nodes with Dual processor, four core Intel Xeon E5620 CPUs @ 2.40GHz, Hyperthreaded, 12GB Memory, 100GB /tmp.
* Cluster Status: [http://lipid.cac.cornell.edu/ganglia/ Ganglia].
+
:* Cluster Status: [http://lipid.cac.cornell.edu/ganglia/ Ganglia].
* Submit HELP requests: [http://www.cac.cornell.edu/help help] OR by sending email to: cac-help@cornell.edu
+
:* Submit HELP requests: [https://{{SERVERNAME}}/help help] OR by sending email to: cac-help@cornell.edu
  
 
===Maui Scheduler and Job submission/monitoring commands===
 
===Maui Scheduler and Job submission/monitoring commands===
* Scheduler: maui 3.2.5; Resource manager: torque 2.4.7
+
:* Scheduler: maui 3.2.5; Resource manager: torque 2.4.7
  
 
Jobs are scheduled by the [http://www.adaptivecomputing.com/resources/docs/maui/mauiusers.php Maui scheduler] with the [http://www.adaptivecomputing.com/products/open-source/torque/ Torque resource manager]. We suggest you use a job submission batch file utilizing  [http://docs.adaptivecomputing.com/torque/4-1-3/help.htm#topics/commands/qsub.htm PBS Directives ('Options' section).]  
 
Jobs are scheduled by the [http://www.adaptivecomputing.com/resources/docs/maui/mauiusers.php Maui scheduler] with the [http://www.adaptivecomputing.com/products/open-source/torque/ Torque resource manager]. We suggest you use a job submission batch file utilizing  [http://docs.adaptivecomputing.com/torque/4-1-3/help.htm#topics/commands/qsub.htm PBS Directives ('Options' section).]  
Line 16: Line 16:
 
(If you have any experience with PBS/Torque or SGE, [http://www.adaptivecomputing.com/resources/docs/maui/a.gcommandoverview.php Maui Commands] may be recognizable. Most used:
 
(If you have any experience with PBS/Torque or SGE, [http://www.adaptivecomputing.com/resources/docs/maui/a.gcommandoverview.php Maui Commands] may be recognizable. Most used:
  
''''[http://docs.adaptivecomputing.com/torque/3-0-5/commands/qsub.php qsub]'''  
+
'''[http://docs.adaptivecomputing.com/torque/3-0-5/commands/qsub.php qsub]'''  
 
- Job submission (jobid will be displayed for the job submitted)
 
- Job submission (jobid will be displayed for the job submitted)
* $ qsub jobscript.sh
+
:* $ qsub jobscript.sh
  
 
'''[http://www.adaptivecomputing.com/resources/docs/maui/commands/showq.php showq] - Display queue information.'''   
 
'''[http://www.adaptivecomputing.com/resources/docs/maui/commands/showq.php showq] - Display queue information.'''   
  
* $ showq (dump everything)
+
:* $ showq (dump everything)
* $ showq -r (show running jobs)
+
:* $ showq -r (show running jobs)
* $ showq -u foo42 (shows foo42's jobs)
+
:* $ showq -u foo42 (shows foo42's jobs)
  
 
'''[http://www.adaptivecomputing.com/resources/docs/maui/commands/checkjob.php checkjob] - Display job information.'''  (You can only checkjob your own jobs.)
 
'''[http://www.adaptivecomputing.com/resources/docs/maui/commands/checkjob.php checkjob] - Display job information.'''  (You can only checkjob your own jobs.)
  
* $ checkjob -A jobid (get dense key-value pair information on job 42)
+
:* $ checkjob -A jobid (get dense key-value pair information on job 42)
* $ checkjob -v jobid (get verbose information on job 42)
+
:* $ checkjob -v jobid (get verbose information on job 42)
  
 
'''[http://www.adaptivecomputing.com/resources/docs/maui/commands/canceljob.php canceljob] - Cancel Job.''' (You can only cancel your own jobs.)
 
'''[http://www.adaptivecomputing.com/resources/docs/maui/commands/canceljob.php canceljob] - Cancel Job.''' (You can only cancel your own jobs.)
* $ canceljob jobid
+
:* $ canceljob jobid
  
 
==== Setting Up your Job Submission Batch File ====
 
==== Setting Up your Job Submission Batch File ====
Line 39: Line 39:
  
 
==== Queues:====
 
==== Queues:====
* all
+
:* all
** contains compute-1-[1-17]
+
:** contains compute-1-[1-17]
** wallclock limit: 72 hours (3 days)
+
:** wallclock limit: 72 hours (3 days)
* long
+
:* long
** contains compute-1-[1-7]
+
:** contains compute-1-[1-7]
** wallclock limit: 336 hours (14 days)
+
:** wallclock limit: 336 hours (14 days)
* priority
+
:* priority
** contains compute-1-[1-17]
+
:** contains compute-1-[1-17]
** wallclock limit: 72 hours (3 days)
+
:** wallclock limit: 72 hours (3 days)
** This queue is on an "honor" system; please only use this when your job is important to not have to wait in line behind other jobs submitted to 'long' or 'all'. This queue will not stop running jobs. It is for bumping the job priority waiting to run only.
+
:** This queue is on an "honor" system; please only use this when your job is important to not have to wait in line behind other jobs submitted to 'long' or 'all'. This queue will not stop running jobs. It is for bumping the job priority waiting to run only.
  
 
====Software Installed:====  
 
====Software Installed:====  
* charmm v39b1 (/opt/charmm; /usr/local/bin/charmm)
+
:* charmm v39b1 (/opt/charmm; /usr/local/bin/charmm)
* gromacs v4.5.7
+
:* gromacs v4.5.7
* gromacs-custom (/opt/gromacs)
+
:* gromacs-custom (/opt/gromacs)
 +
:* Mathematica 10.4
  
 
====Quick Tutorial====
 
====Quick Tutorial====

Latest revision as of 15:02, 11 May 2016

LIPID Cluster General Information

LIPID is a private cluster with restricted access to the gwf3_0001 group.

  • Head node: lipid.cac.cornell.edu
  • Rocks 6.1.1 with CentOS 6.5
  • 17 compute nodes with Dual processor, four core Intel Xeon E5620 CPUs @ 2.40GHz, Hyperthreaded, 12GB Memory, 100GB /tmp.
  • Cluster Status: Ganglia.
  • Submit HELP requests: help OR by sending email to: cac-help@cornell.edu

Maui Scheduler and Job submission/monitoring commands

  • Scheduler: maui 3.2.5; Resource manager: torque 2.4.7

Jobs are scheduled by the Maui scheduler with the Torque resource manager. We suggest you use a job submission batch file utilizing PBS Directives ('Options' section).

Common Maui Commands

(If you have any experience with PBS/Torque or SGE, Maui Commands may be recognizable. Most used:

qsub - Job submission (jobid will be displayed for the job submitted)

  • $ qsub jobscript.sh

showq - Display queue information.

  • $ showq (dump everything)
  • $ showq -r (show running jobs)
  • $ showq -u foo42 (shows foo42's jobs)

checkjob - Display job information. (You can only checkjob your own jobs.)

  • $ checkjob -A jobid (get dense key-value pair information on job 42)
  • $ checkjob -v jobid (get verbose information on job 42)

canceljob - Cancel Job. (You can only cancel your own jobs.)

  • $ canceljob jobid

Setting Up your Job Submission Batch File

Commands can be run on the command line with the qsub command. However, we suggest running your jobs from a batch script. PBS Directives are command line arguments inserted at the top of the batch script, each directive prepended with '#PBS' (no spaces). We suggest you use a job submission batch file utilizing PBS Directives ('Options' section).

Queues:

  • all
    • contains compute-1-[1-17]
    • wallclock limit: 72 hours (3 days)
  • long
    • contains compute-1-[1-7]
    • wallclock limit: 336 hours (14 days)
  • priority
    • contains compute-1-[1-17]
    • wallclock limit: 72 hours (3 days)
    • This queue is on an "honor" system; please only use this when your job is important to not have to wait in line behind other jobs submitted to 'long' or 'all'. This queue will not stop running jobs. It is for bumping the job priority waiting to run only.

Software Installed:

  • charmm v39b1 (/opt/charmm; /usr/local/bin/charmm)
  • gromacs v4.5.7
  • gromacs-custom (/opt/gromacs)
  • Mathematica 10.4

Quick Tutorial

The batch system on lipid treats each core of a node as a "virtual processor." That means the nodes keyword in batch scripts refers to the number of cores that are scheduled.

Running an MPI Job on the Whole Cluster

First use showq to see how many cores are available. It may be less than 272 if a node is down.

#!/bin/sh
#PBS -l nodes=17,walltime=10:00
#PBS -N test
#PBS -j oe
#PBS -q all
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"
nhosts=$(wc -l < $PBS_NODEFILE)

mpiexec -np $nhosts ring -v
Running an MPI Job using 8 Tasks Per Node

Because the nodes have 8 physical cores, you may want to limit jobs to 8 tasks per node. The node file lists each node 16 times, so make a copy with each node listed 8 times, and hand that version to MPI.

#!/bin/sh
#PBS -l nodes=17,walltime=10:00
#PBS -N test
#PBS -j oe
#PBS -q all
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 8 entries per node.
# MPI can use this to run 8 tasks on each node.
uniq "$PBS_NODEFILE"|awk '{for(i=0;i<8;i+=1) print}'>nodefile.8way

# Run 8-way on 4 nodes
mpiexec --hostfile nodefile.8way ring -v
Running Many Copies of a Serial Job

In order to run 30 separate instances of the same program, use the scheduler's task array feature, through the "-t" option. The "nodes" parameter here refers to a core.

#!/bin/sh
#PBS -l nodes=1,walltime=10:00
#PBS -t 1-30
#PBS -N test
#PBS -j oe
#PBS -q all
#PBS -S /bin/bash

set -x
cd "$PBS_O_WORKDIR"
echo Run my job.

When you start jobs this way, separate jobs will pile one-per-core onto nodes like a box of hamsters.