Difference between revisions of "ATLAS2 Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 56: Line 56:
  
 
===Common Slurm Commands===  
 
===Common Slurm Commands===  
''''[https://slurm.schedmd.com/quickstart.html Slurm Quickstart Guide]""
+
[https://slurm.schedmd.com/quickstart.html Slurm Quickstart Guide]
 +
 
 +
[https://slurm.schedmd.com/pdfs/summary.pdf Command/Option Summary (two page PDF)]
  
 
==Software==
 
==Software==

Revision as of 15:57, 30 January 2017

ATLAS2 General Information

  • ATLAS2 is a private cluster with restricted access to the bs54_0001 group. (currently in the process of building and not yet available to the group. ETA: 1/20/17)
  • ATLAS2 currently has one head node (atlas2.cac.cornell.edu) and 6 compute nodes [c0012, c0015, c0016, c0034, c0044 and c0046) (will have the full 56 compute nodes upon completion)
  • Head node: atlas2.cac.cornell.edu (access via ssh)
    • Open HPC deployment running Centos 7.3.1611
    • Cluster scheduler: slurm 16.05
    • /home (15TB) directory server (nfs exported to all cluster nodes)

Getting Started with the ATLAS cluster

Queues/Partitions ("Partition" is the term used by slurm) =

ATLAS2 has 5 separate queues:

  • short (default)
Number of nodes: 28 servers (total: 56 cpu, 336 cores)
Node Names: c00[17-44] **CURRENTLY ONLY c0034 & c0044 are up for testing of the short queue! **
Memory per node: 48GB
/tmp per node: 409GB
Limits: Maximum of 28 nodes, walltime limit: 4 hours
  • long
Number of nodes: 18 servers (total: 36 cpu, 216 cores)
Node Names: c00[17-34] **CURRENTLY ONLY c0034 is up for testing of the long queue! **
Memory per node: 48GB
/tmp per node: 409GB
Limits: Maximum of 18 nodes, walltime limit: 504 hours
  • inter ~Interactive
Number of nodes: 12 servers (total: 24 cpu, 144 cores)
Node Names: c00[45-56] .  **CURRENTLY ONLY c0046 is up for testing of the inter queue! **
Memory per node: 48GB
/tmp per node: 68GB
Limits: Maximum of 12 nodes, walltime limit: 168 hours
  • bigmem
Number of Nodes 12 servers (total: 24 cpu, 144 cores)
Node Names: c00[01-12] .  **CURRENTLY ONLY c0012 is up for testing of the bigmem queue! **
HW: Intel x5690 3.46GHZ, 128GB SSD drive,
Memory per node: 96GB
/tmp per node: 68GB
Limits: Maximum of 12 nodes, walltime limit: 168 hours
  • gpu
Number of nodes: 4 servers (total: 8 cpu, 48 cores)
Node Names: c00[13-16] **CURRENTLY ONLY c0015 & c0016 are up for testing of the gpu queue! **
HW: 2xIntel x5670 2.93GHZ, 500GB SATA drive, 1 Tesla M2090
Memory per node: 48GB
/tmp per node: 409GB
Limits: Maximum of 4 nodes, walltime limit: 168 hours


Common Slurm Commands

Slurm Quickstart Guide

Command/Option Summary (two page PDF)

Software

Installed Software

The  'lmod module' system is implemented. Use: module avail 
to list what environment you can put yourself in for a software version. 
    (to get a more complete listing, type: module spider)

EXAMPLE:
To be sure you are using the environment setup for gdal2, you would type:
* module avail
* module load gdal2
- when done, either logout and log back in or type:
* module unload gdal2

You can create your own modules and place them in your $HOME.  
Once created, type:
module use $HOME/path/to/personal/modulefiles
This will prepend the path to $MODULEPATH
[type echo $MODULEPATH to confirm]

(sortable table)
Package and Version Location module available Notes
cplex studio 128 /opt/ohpc/pub/ibm/ILOG/CPLEX_Studio128/ cplex/12.8
cuda toolkit 8-0 /usr/local/cuda-8.0 c0015 & c0016 (gpus)
gcc 7.2.0 /opt/ohpc/pub/compiler/gcc/7.2.0/bin/gcc gnu7/7.2.0
gcc 4.8.5 (default) /usr/bin/gcc
gdal 2.2.3 /opt/ohpc/pub/gdal2.2.3 gdal/2.2.3
java openjdk 1.8.0 /usr/bin/java
Python 2.7.5 (default) /usr/bin/python The system-wide installation of packages is no longer supported. See below for Anaconda/miniconda install information.
R 3.4.3 /usr/bin/R The system-wide installation of packages is no longer supported.
Subversion (svn) 1.7 /usr/bin/svn
  • It is usually possible to install software in your home directory.
  • List installed software via rpms: 'rpm -qa'. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]

How to run jobs

Running jobs on the ATLAS2 cluster

Running Jobs on the ATLAS2 cluster

MATLAB: running MDCS jobs on the ATLAS2 cluster

Running MDCS Jobs on the ATLAS2 cluster

Using R and snowfall in batch