Difference between revisions of "ATLAS2 Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 72: Line 72:
 
* If the package(s) are listed, you can send a request to cac-help@cornell.edu to have the software installed system-wide; you cannot install this yourself.
 
* If the package(s) are listed, you can send a request to cac-help@cornell.edu to have the software installed system-wide; you cannot install this yourself.
  
 +
=== Installed Software ===
 +
<pre>
 +
The  'lmod module' system is implemented. Use: module avail
 +
to list what environment you can put yourself in for a software version.
 +
    (to get a more complete listing, type: module spider)
 +
 +
EXAMPLE:
 +
To be sure you are using the environment setup for gdal2, you would type:
 +
* module avail
 +
* module load gdal2
 +
- when done, either logout and log back in or type:
 +
* module unload gdal2
 +
 +
You can create your own modules and place them in your $HOME. 
 +
Once created, type:
 +
module use $HOME/path/to/personal/modulefiles
 +
This will prepend the path to $MODULEPATH
 +
[type echo $MODULEPATH to confirm]
 +
</pre>
 +
 +
----
 +
 +
:{| class="sortable wikitable" border="1" cellpadding="4" style="width: auto"
 +
|+ (sortable table)<br/>
 +
! style="background:#e9e9e9;" | Package and Version
 +
! style="background:#e9e9e9;" | Location
 +
! style="background:#e9e9e9;" | module available
 +
! style="background:#e9e9e9;" | Notes
 +
|-
 +
| cplex studio 128
 +
| /opt/ohpc/pub/ibm/ILOG/CPLEX_Studio128/
 +
| align="center" | cplex/12.8
 +
| align="center" |
 +
|-
 +
| cuda toolkit 8-0
 +
| /usr/local/cuda-8.0
 +
| align="center" |
 +
| align="center" |  c0015 & c0016 (gpus)
 +
|-
 +
| gcc 7.2.0
 +
| /opt/ohpc/pub/compiler/gcc/7.2.0/bin/gcc
 +
| align="center" | gnu7/7.2.0
 +
| align="center" |
 +
|-
 +
| gcc 4.8.5 (default)
 +
| /usr/bin/gcc
 +
| align="center" |
 +
| align="center" |
 +
|-
 +
| gdal 2.2.3
 +
| /opt/ohpc/pub/gdal2.2.3
 +
| align="center" | gdal/2.2.3
 +
| align="center" |
 +
|-
 +
| java openjdk 1.8.0
 +
| /usr/bin/java
 +
| align="center" |
 +
| align="center" |
 +
|-
 +
| Python 2.7.5 (default)
 +
| /usr/bin/python
 +
| align="center" |
 +
| align="center" |  The system-wide installation of packages is no longer supported. See below for Anaconda/miniconda install information.
 +
|-
 +
| R 3.4.3
 +
| /usr/bin/R
 +
| align="center" |
 +
| align="center" |  The system-wide installation of packages is no longer supported.
 +
|-
 +
| Subversion (svn) 1.7
 +
| /usr/bin/svn
 +
| align="center" |
 +
| align="center" |
 +
|-
 +
|}
 +
 +
:* It is usually possible to install software in your home directory.
 +
:* List installed software via rpms: ''''rpm -qa''''. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]
  
{{:ATLAS2 Installed Software}}
 
 
=== Build software from source into your home directory ($HOME) ===
 
=== Build software from source into your home directory ($HOME) ===
 
<pre>
 
<pre>

Revision as of 14:21, 13 March 2018

ATLAS2 General Information

  • ATLAS2 is a private cluster with restricted access to the bs54_0001 group and is currently available for *TESTING ONLY*
  • ATLAS2 currently has one head node (atlas2.cac.cornell.edu) and 9 compute nodes [c00[06,10-12,15-16,34,44,46] (will have the full 56 compute nodes upon completion)
  • Head node: atlas2.cac.cornell.edu (access via ssh)
    • Open HPC deployment running Centos 7.3.1611
    • Cluster scheduler: slurm 16.05
    • /home (15TB) directory server (nfs exported to all cluster nodes)

Getting Started with the ATLAS cluster

Queues/Partitions ("Partition" is the term used by slurm)

ATLAS2 has 5 separate queues: [NOTE: ABOVE COMMENT THAT THIS CLUSTER CURRENTLY IS RUNNING WITH ONLY 8 COMPUTE NODES)

WE NEED USERS TESTING!
  • short (default)
Number of nodes: 28 servers (total: 56 cpu, 336 cores)
Node Names: c00[17-44] **CURRENTLY ONLY c0034 & c0044 are up for testing of the short queue! **
Memory per node: 48GB
/tmp per node: 409GB
Limits: Maximum of 28 nodes, walltime limit: 4 hours
  • long
Number of nodes: 18 servers (total: 36 cpu, 216 cores)
Node Names: c00[17-34] **CURRENTLY ONLY c0034 is up for testing of the long queue! **
Memory per node: 48GB
/tmp per node: 409GB
Limits: Maximum of 18 nodes, walltime limit: 504 hours
  • inter ~Interactive
Number of nodes: 12 servers (total: 24 cpu, 144 cores)
Node Names: c00[45-56] .  **CURRENTLY ONLY c0046 is up for testing of the inter queue! **
Memory per node: 48GB
/tmp per node: 68GB
Limits: Maximum of 12 nodes, walltime limit: 168 hours
  • bigmem
Number of Nodes 12 servers (total: 24 cpu, 144 cores)
Node Names: c00[01-12] .  **CURRENTLY ONLY c00[06,10,11,12] are up for testing of the bigmem queue! **
HW: Intel x5690 3.46GHZ, 128GB SSD drive,
Memory per node: 96GB
/tmp per node: 68GB
Limits: Maximum of 12 nodes, walltime limit: 168 hours
  • gpu
Number of nodes: 4 servers (total: 8 cpu, 48 cores)
Node Names: c00[13-16] **CURRENTLY ONLY c0015 & c0016 are up for testing of the gpu queue! **
HW: 2xIntel x5670 2.93GHZ, 500GB SATA drive, 1 Tesla M2090
Memory per node: 48GB
/tmp per node: 409GB
Limits: Maximum of 4 nodes, walltime limit: 168 hours


Common Slurm Commands

Slurm Quickstart Guide

Command/Option Summary (two page PDF)

Software

  • Openhpc uses yum to install available software from the installed repositories.
    • To view all options of yum, type: man yum
    • To view installed repositories, type: yum repolist
    • To view if your requested software is in one of the installed repositories, use: yum search
    • i.e. To search whether variations of tau are available, you would type:
 yum search tau
  • If the package(s) are listed, you can send a request to cac-help@cornell.edu to have the software installed system-wide; you cannot install this yourself.

Installed Software

The  'lmod module' system is implemented. Use: module avail 
to list what environment you can put yourself in for a software version. 
    (to get a more complete listing, type: module spider)

EXAMPLE:
To be sure you are using the environment setup for gdal2, you would type:
* module avail
* module load gdal2
- when done, either logout and log back in or type:
* module unload gdal2

You can create your own modules and place them in your $HOME.  
Once created, type:
module use $HOME/path/to/personal/modulefiles
This will prepend the path to $MODULEPATH
[type echo $MODULEPATH to confirm]

(sortable table)
Package and Version Location module available Notes
cplex studio 128 /opt/ohpc/pub/ibm/ILOG/CPLEX_Studio128/ cplex/12.8
cuda toolkit 8-0 /usr/local/cuda-8.0 c0015 & c0016 (gpus)
gcc 7.2.0 /opt/ohpc/pub/compiler/gcc/7.2.0/bin/gcc gnu7/7.2.0
gcc 4.8.5 (default) /usr/bin/gcc
gdal 2.2.3 /opt/ohpc/pub/gdal2.2.3 gdal/2.2.3
java openjdk 1.8.0 /usr/bin/java
Python 2.7.5 (default) /usr/bin/python The system-wide installation of packages is no longer supported. See below for Anaconda/miniconda install information.
R 3.4.3 /usr/bin/R The system-wide installation of packages is no longer supported.
Subversion (svn) 1.7 /usr/bin/svn
  • It is usually possible to install software in your home directory.
  • List installed software via rpms: 'rpm -qa'. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]

Build software from source into your home directory ($HOME)

* download and extract your source
* cd to your extracted source directory
* ./configure --./configure --prefix=$HOME/appdir
[You need to refer to your source documentation to get the full list of options you can provide 'configure' with.]
* make
* make install

The binary would then be located in ~/appdir/bin. 
* Add the following to your $HOME/.bashrc: 
      export PATH="$HOME/appdir/bin:$PATH"
* Reload the .bashrc file with source ~/.bashrc. (or logout and log back in)


Slurm HELP

Slurm Workload Manager Quick Start User Guide - this page lists all of the available Slurm commands

Slurm Workload Manager Frequently Asked Questions includes FAQs for Management, Users and Administrators

Convenient SLURM Commands has examples for getting information on jobs and controlling jobs

Slurm Workload Manager - sbatch - used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.

A few slurm commands to initially get familiar with:
scontrol show nodes
scontrol show partition

Submit a job: sbatch testjob.sh
Interactive Job: srun -p short --pty /bin/bash

scontrol show job [job id]
scancel [job id]
sinfo -l

Running jobs on the ATLAS2 cluster

Running Jobs on the ATLAS2 cluster


Example sbatch file to run a job in the short partition/queue; save as example.sh:

#!/bin/bash
## J sets the name of job
#SBATCH -J TestJob
## -p sets the partition (queue)
#SBATCH -p long 
## 10 min
#SBATCH --time=00:10:00
## request a single task(core)
#SBATCH -n1
## request 300MB per core
## #SBATCH --mem-per-cpu=300
## define jobs stdout file
#SBATCH -o testlong-%j.out
## define jobs stderr file
#SBATCH -e testlong-%j.err

echo "starting at `date` on `hostname`"

# Print the SLURM job ID.
echo "SLURM_JOBID=$SLURM_JOBID"

echo "hello world `hostname`"

echo "ended at `date` on `hostname`"
exit 0

Then run:

sbatch example.sh

Then submit:

scontrol show job 9

You should see the node it ran on and that it was run in the short partition/queue.

MATLAB: running MDCS jobs on the ATLAS2 cluster

Running MDCS Jobs on the ATLAS2 cluster

How to Install R packages in your home directory

Reference: http://cran.r-project.org/doc/manuals/R-admin.html#Managing-libraries

************************************************************************************
NOTE: Steps 1) through 4) need to be done once, after your Rlibs directory
has been created and your R_LIBS environment is set, you can install additional 
packages using step 5).
************************************************************************************

Know your R library search path:
    Start R and run .libPaths()  Sample output is shown below:
    > .libPaths()
     [1] "/usr/lib64/R/library"

Now we will create a local Rlibs directory and add this to the library search path.
NOTE: Make sure R is NOT running before you proceed.

1) Create a directory in your home directory you would like to install the R packages, e.g. Rlibs 
mkdir  ~/Rlibs

2) Create a .profile file in your home directory (or modify existing) using your favorite editor (emacs, vim, nano, etc)  
   
     Add the following to your .profile
     #!/bin/sh
     if [ -n $R_LIBS ]; then
        export R_LIBS=~/Rlibs:$R_LIBS
     else
        export R_LIBS=~/Rlibs
     fi

3) To reset the R_LIBS path we need to run the following: "source ~/.profile" (or logout and log back in) 

4) Confirm the change is in your library path:
     start R
> .libPaths()
[1] "$HOME/Rlibs"     
[2] "/usr/lib64/R/library"   

  
5) Install the package in your local directory 
>install.packages("packagename","~/Rlibs","https://cran.r-project.org/")
i.e. to install the package:snow
>install.packages("snow","~/Rlibs","https://cran.r-project.org/")

6) For more help with install.packages() use
>?install.packages( )  

7) To see which libraries are available in your R library path, run library() 
The output will show your local packages and the system wide packages
>library()

How to Install Python Anaconda (miniconda) home directory

https://conda.io/docs/user-guide/tutorials/index.html