ATLAS2 Cluster

From CAC Documentation wiki
Revision as of 18:33, 3 June 2019 by Pzv2 (talk | contribs) (→‎Modules: Fixed a broken link)
Jump to navigation Jump to search

Getting Started

General Information

  • ATLAS2 is a private cluster with restricted access to the bs54_0001 group.
  • Head node: atlas2.cac.cornell.edu (access via ssh)
    • OpenHPC deployment running Centos 7.6
    • Cluster scheduler: Slurm 17.11.10
    • /home 15TB directory server (nfs exported to all cluster nodes)
    • Intel(R) Xeon(R) E5-2637 v4 @ 3.5GHz; supports vector extensions up to AVX2
  • 55 compute nodes c00[01-16, 31-48,50-70]
  • Current Cluster Status: Ganglia.
  • Please send any questions and report problems to: cac-help@cornell.edu

How To Login

  • To get started, login to the head node atlas2.cac.cornell.edu via ssh.
  • If you are unfamiliar with Linux and ssh, we suggest reading the Linux Tutorial and looking into how to Connect to Linux before proceeding.
  • You will be prompted for your CAC account password

Hardware

All nodes have hyperthreading turned on and are Xeon generations that supports vector extensions: SSE4.2.

Node Names Memory per node Model name Processor count per node Core(s) per socket Sockets Thread(s) per core
c00[01-12] 94GB Intel(R) Xeon(R) CPU X5690 @ 3.47GHz 24 6 2 2
c00[13-16] 47GB Intel(R) Xeon(R) CPU X5670 @ 2.93GHz 24 6 2 2
c00[31-48,50-58] 47GB Intel(R) Xeon(R) CPU X5670 @ 2.93GHz 24 6 2 2
c00[59-70] 47GB Intel(R) Xeon(R) CPU X5690 @ 3.47GHz 24 6 2 2

Networking

  • All nodes have a 1GB ethernet connection for eth0 on a private net served out from the atlas2 head node.
  • All nodes have an Infiniband connection:
    • InfiniPath_QLE7340n (QDR speed, 8Gbits/sec)
    • PLEASE NOTE: One of the 5 Infiniband switches has failed. While it is determined if it will be replaced, the following nodes do not have an "Active" state for Infiniband:

      c00[31-32,43-44,47,50-53,55-58]

Running Jobs with Slurm

For detailed information and a quick-start guide, see the Slurm page.

ATLAS2 Queues/Partitions

("Partition" is the term used by Slurm)

  • hyperthreading is turned on for ALL nodes
  • all partitions have a default time of 1 hour
  • ATLAS2 has 5 separate queues:
Queue/Partition Number of nodes Node Names Limits
short (default) 31 c00[13-16,31-48,50-58] walltime limit: 4 hours
long 22 c00[13-16,31-48] walltime limit: 504 hours
inter ~Interactive 12 c00[59-70] walltime limit: 168 hours
bigmem 12 servers c00[01-12]   Maximum of 12 nodes, walltime limit: 168 hours
normal 55 servers c00[01-16, 31-48,50-70] walltime limit: 4 hours

Example in Short Partition/Queue

Example sbatch file to run a job in the short partition/queue; save as example.sh:

#!/bin/bash
## J sets the name of job
#SBATCH -J TestJob

## -p sets the partition (queue)
#SBATCH -p long 

## 10 min
#SBATCH --time=00:10:00

## sets the tasks per core (default=2; keep default if you want to take advantage of hyperthreading)
## 2 will take whole cores, but will divide by 2 with hyperthreading
#SBATCH --ntasks-per-core=1 

## request 300MB per core
#SBATCH --mem-per-cpu=4GB

## define jobs stdout file
#SBATCH -o testlong-%j.out

## define jobs stderr file
#SBATCH -e testlong-%j.err

echo "starting at `date` on `hostname`"

# Print the SLURM job ID.
echo "SLURM_JOBID=$SLURM_JOBID"

echo "hello world `hostname`"

echo "ended at `date` on `hostname`"
exit 0

Submit/Run your job:

sbatch example.sh

View your job:

scontrol show job 9

Software

The cluster is managed with OpenHPC, which uses yum to install available software from the installed repositories.

  • To view all options of yum, type: man yum
  • To view installed repositories, type: yum repolist
  • To view if your requested software package is in one of the installed repositories, use: yum search <package>
  • i.e. To search whether variations of tau are available, you would type:
 yum search tau

Installed Software

(sortable table)
Package and Version Location module available Notes
cplex studio 128 /opt/ohpc/pub/ibm/ILOG/CPLEX_Studio128/ cplex/12.8
cuda toolkit 9.0 /opt/ohpc/pub/cuda-9.0 cudnn 9.0 in targets/x86_64-linux/lib/
cuda toolkit 9.1 /opt/ohpc/pub/cuda-9.1 cudnn 9.1 in targets/x86_64-linux/lib/
cuda toolkit 9.2 /opt/ohpc/pub/cuda-9.2 cudnn 9.2 in targets/x86_64-linux/lib/
cuda toolkit 10.0 /opt/ohpc/pub/cuda-10.0 cudnn 7.4.1 for cuda10 in targets/x86_64-linux/lib/
gcc 7.2.0 /opt/ohpc/pub/compiler/gcc/7.2.0/bin/gcc gnu7/7.2.0
gcc 4.8.5 (default) /usr/bin/gcc
gdal 2.2.3 /opt/ohpc/pub/gdal2.2.3 gdal/2.2.3
java openjdk 1.8.0 /usr/bin/java
Python 2.7.5 (default) /usr/bin/python The system-wide installation of packages is no longer supported. See below for Anaconda/miniconda install information.
R 3.5.1 /usr/bin/R The system-wide installation of packages is no longer supported.
Subversion (svn) 1.7 /usr/bin/svn
  • It is usually possible to install software in your home directory.
  • List installed software via rpms: 'rpm -qa'. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]

Modules

Since this cluster is managed with OpenHPC, the Lmod Module System is implemented. You can see detailed information and instructions at the linked page.

Example: To be sure you are using the environment setup for cplex, you would type:

$ module avail
$ module load cplex

When done, either logout and log back in or type module unload cplex

You can also create your own modules and place them in your $HOME. For instructions, see the Modules (Lmod) page.

Once created, type module use $HOME/path/to/personal/modulefiles. This will prepend the path to $MODULEPATH. Type echo $MODULEPATH to confirm.

Build software from source into your home directory ($HOME)

* download and extract your source
* cd to your extracted source directory
* ./configure --./configure --prefix=$HOME/appdir
[You need to refer to your source documentation to get the full list of options you can provide 'configure' with.]
* make
* make install

The binary would then be located in ~/appdir/bin. 
* Add the following to your $HOME/.bashrc: 
      export PATH="$HOME/appdir/bin:$PATH"
* Reload the .bashrc file with source ~/.bashrc. (or logout and log back in)

How to Install R packages in your home directory

Reference: http://cran.r-project.org/doc/manuals/R-admin.html#Managing-libraries

************************************************************************************
NOTE: Steps 1) through 4) need to be done once, after your Rlibs directory
has been created and your R_LIBS environment is set, you can install additional 
packages using step 5).
************************************************************************************

Know your R library search path:
    Start R and run .libPaths()  Sample output is shown below:
    > .libPaths()
     [1] "/usr/lib64/R/library"

Now we will create a local Rlibs directory and add this to the library search path.
NOTE: Make sure R is NOT running before you proceed.

1) Create a directory in your home directory you would like to install the R packages, e.g. Rlibs 
mkdir  ~/Rlibs

2) Create a .profile file in your home directory (or modify existing) using your favorite editor (emacs, vim, nano, etc)  
   
     Add the following to your .profile
     #!/bin/sh
     if [ -n $R_LIBS ]; then
        export R_LIBS=~/Rlibs:$R_LIBS
     else
        export R_LIBS=~/Rlibs
     fi

3) To reset the R_LIBS path we need to run the following: "source ~/.profile" (or logout and log back in) 

4) Confirm the change is in your library path:
     start R
> .libPaths()
[1] "$HOME/Rlibs"     
[2] "/usr/lib64/R/library"   

  
5) Install the package in your local directory 
>install.packages("packagename","~/Rlibs","https://cran.r-project.org/")
i.e. to install the package:snow
>install.packages("snow","~/Rlibs","https://cran.r-project.org/")

6) For more help with install.packages() use
>?install.packages( )  

7) To see which libraries are available in your R library path, run library() 
The output will show your local packages and the system wide packages
>library()

How to Install Python Anaconda (miniconda) home directory

https://conda.io/docs/user-guide/tutorials/index.html