Difference between revisions of "Testpage"

From CAC Documentation wiki
Jump to navigation Jump to search
m (Section re-titled to Running Jobs with Slurm (removed Slurm subsection))
(Cleared ATLAS2 info because it was copied to ATLAS2 page)
 
Line 1: Line 1:
== Getting Started ==
+
A page for testing edits.
 
 
=== General Information ===
 
 
 
:* ATLAS2 is a private cluster with restricted access to the bs54_0001 group.
 
:* Head node:  '''atlas2.cac.cornell.edu'''  ([[#How To Login|access via ssh]])
 
:** [https://github.com/openhpc/ohpc/wiki OpenHPC] deployment running Centos 7.6
 
:** Cluster scheduler: [[Slurm]] 17.11.10
 
:** <code>/home</code> 15TB directory server (nfs exported to all cluster nodes)
 
:** Intel(R) Xeon(R) E5-2637 v4 @ 3.5GHz; supports vector extensions up to AVX2
 
:* 55 compute nodes [[#Hardware|c00[01-16, 31-48,50-70]]]
 
:* Current Cluster Status: [http://atlas2.cac.cornell.edu/ganglia/ Ganglia].
 
:* Please send any questions and report problems to:  [mailto:cac-help@cornell.edu cac-help@cornell.edu]
 
 
 
=== How To Login ===
 
 
 
:* To get started, login to the head node <tt>atlas2.cac.cornell.edu</tt> via ssh.
 
:* If you are unfamiliar with Linux and ssh, we suggest reading the [[Linux Tutorial]] and looking into how to [[Connect to Linux]] before proceeding.
 
:* You will be prompted for your [https://www.cac.cornell.edu/services/myacct.aspx CAC account] password
 
 
 
=== Hardware ===
 
 
 
'''All nodes''' have hyperthreading turned on and are Xeon generations that supports vector extensions: SSE4.2.
 
 
 
:{| class="wikitable" border="1" cellpadding="5" style="width: auto"
 
! style="background:#e9e9e9;" | Node Names
 
! style="background:#e9e9e9;" | Memory per node
 
! style="background:#e9e9e9;" | Model name
 
! style="background:#e9e9e9;" | Processor count per node
 
! style="background:#e9e9e9;" | Core(s) per socket
 
! style="background:#e9e9e9;" | Sockets
 
! style="background:#e9e9e9;" | Thread(s) per core
 
|-
 
| c00[01-12]
 
| align="center" | 94GB
 
| align="center" | Intel(R) Xeon(R) CPU X5690  @ 3.47GHz
 
| align="center" | 24
 
| align="center" | 6
 
| align="center" | 2
 
| align="center" | 2
 
|-
 
| c00[13-16]
 
| align="center" | 47GB
 
| align="center" | Intel(R) Xeon(R) CPU X5670  @ 2.93GHz
 
| align="center" | 24
 
| align="center" | 6
 
| align="center" | 2
 
| align="center" | 2
 
|-
 
| c00[31-48,50-58]
 
| align="center" | 47GB
 
| align="center" | Intel(R) Xeon(R) CPU X5670  @ 2.93GHz
 
| align="center" | 24
 
| align="center" | 6
 
| align="center" | 2
 
| align="center" | 2
 
|-
 
| c00[59-70]
 
| align="center" | 47GB
 
| align="center" | Intel(R) Xeon(R) CPU X5690 @ 3.47GHz
 
| align="center" | 24
 
| align="center" | 6
 
| align="center" | 2
 
| align="center" | 2
 
|-
 
|}
 
 
 
=== Networking ===
 
:* All nodes have a 1GB ethernet connection for eth0 on a private net served out from the atlas2 head node.
 
:* All nodes have an Infiniband connection:
 
:** InfiniPath_QLE7340n  (QDR speed, 8Gbits/sec)
 
:**''' PLEASE NOTE:''' One of the 5 Infiniband switches has failed. While it is determined if it will be replaced, the following nodes do not have an "Active" state for Infiniband: <p><code>c00[31-32,43-44,47,50-53,55-58] </code></p>
 
 
 
== Running Jobs with Slurm ==
 
 
 
'''For detailed information and a quick-start guide, see the [[Slurm]] page.'''
 
 
 
=== ATLAS2 Queues/Partitions  ===
 
 
 
("Partition" is the term used by Slurm)
 
 
 
:* '''hyperthreading is turned on for ALL nodes'''
 
:* '''all partitions have a default time of 1 hour'''
 
:* ATLAS2 has 5 separate queues: 
 
 
 
:{| class="wikitable" border="1" cellpadding="4" style="width: auto"
 
! style="background:#e9e9e9;" | Queue/Partition
 
! style="background:#e9e9e9;" | Number of nodes
 
! style="background:#e9e9e9;" | Node Names
 
! style="background:#e9e9e9;" | Limits
 
|-
 
| '''short''' (default)
 
| align="center" | 31
 
| align="center" | c00[13-16,31-48,50-58]
 
| align="center" | walltime limit: 4 hours
 
|-
 
| '''long'''
 
| align="center" | 22
 
| align="center" | c00[13-16,31-48]
 
| align="center" | walltime limit: 504 hours
 
|-
 
| '''inter''' ~Interactive
 
| align="center" | 12
 
| align="center" | c00[59-70]
 
| align="center" | walltime limit: 168 hours
 
|-
 
| '''bigmem'''
 
| align="center" | 12 servers
 
| align="center" | c00[01-12]   
 
| align="center" | Maximum of 12 nodes, walltime limit: 168 hours 
 
|-
 
| '''normal'''
 
| align="center" | 55 servers
 
| align="center" | c00[01-16, 31-48,50-70] 
 
| align="center" | walltime limit: 4 hours
 
|-
 
|}
 
 
 
=== Example in Short Partition/Queue ===
 
 
 
Example sbatch file to run a job in the short partition/queue; save as example.sh:
 
 
 
<pre>
 
#!/bin/bash
 
## J sets the name of job
 
#SBATCH -J TestJob
 
 
 
## -p sets the partition (queue)
 
#SBATCH -p long
 
 
 
## 10 min
 
#SBATCH --time=00:10:00
 
 
 
## sets the tasks per core (default=2; keep default if you want to take advantage of hyperthreading)
 
## 2 will take whole cores, but will divide by 2 with hyperthreading
 
#SBATCH --ntasks-per-core=1
 
 
 
## request 300MB per core
 
#SBATCH --mem-per-cpu=4GB
 
 
 
## define jobs stdout file
 
#SBATCH -o testlong-%j.out
 
 
 
## define jobs stderr file
 
#SBATCH -e testlong-%j.err
 
 
 
echo "starting at `date` on `hostname`"
 
 
 
# Print the SLURM job ID.
 
echo "SLURM_JOBID=$SLURM_JOBID"
 
 
 
echo "hello world `hostname`"
 
 
 
echo "ended at `date` on `hostname`"
 
exit 0
 
 
 
</pre>
 
 
 
Submit/Run your job:
 
<pre>
 
sbatch example.sh
 
</pre>
 
 
 
View your job:
 
<pre>
 
scontrol show job 9
 
</pre>
 
 
 
==Software==
 
 
 
The cluster is managed with [https://github.com/openhpc/ohpc/wiki OpenHPC], which uses [https://en.wikipedia.org/wiki/Yum_(software) yum] to install available software from the installed repositories.
 
:* To view all options of yum, type: <code>man yum</code>
 
:* To view installed repositories, type: <code>yum repolist</code>
 
:* To view if your requested software package is in one of the installed repositories, use: <code>yum search <package></code>
 
:* i.e. To search whether variations of tau are available, you would type:
 
<pre>
 
yum search tau
 
</pre>
 
 
 
=== Installed Software ===
 
 
 
:{| class="sortable wikitable" border="1" cellpadding="4" style="width: auto"
 
|+ (sortable table)<br/>
 
! style="background:#e9e9e9;" | Package and Version
 
! style="background:#e9e9e9;" | Location
 
! style="background:#e9e9e9;" | module available
 
! style="background:#e9e9e9;" | Notes
 
|-
 
| cplex studio 128
 
| /opt/ohpc/pub/ibm/ILOG/CPLEX_Studio128/
 
| align="center" | cplex/12.8
 
| align="center" |
 
|-
 
| cuda toolkit 9.0
 
| /opt/ohpc/pub/cuda-9.0
 
| align="center" |
 
| cudnn 9.0 in targets/x86_64-linux/lib/
 
|-
 
| cuda toolkit 9.1
 
| /opt/ohpc/pub/cuda-9.1
 
| align="center" |
 
| cudnn 9.1 in targets/x86_64-linux/lib/
 
|-
 
| cuda toolkit 9.2
 
| /opt/ohpc/pub/cuda-9.2
 
| align="center" |
 
| cudnn 9.2 in targets/x86_64-linux/lib/
 
|-
 
| cuda toolkit 10.0
 
| /opt/ohpc/pub/cuda-10.0
 
| align="center" |
 
| cudnn 7.4.1 for cuda10  in targets/x86_64-linux/lib/
 
|-
 
| gcc 7.2.0
 
| /opt/ohpc/pub/compiler/gcc/7.2.0/bin/gcc
 
| align="center" | gnu7/7.2.0
 
| align="center" |
 
|-
 
| gcc 4.8.5 (default)
 
| /usr/bin/gcc
 
| align="center" |
 
| align="center" |
 
|-
 
| gdal 2.2.3
 
| /opt/ohpc/pub/gdal2.2.3
 
| align="center" | gdal/2.2.3
 
| align="center" |
 
|-
 
| java openjdk 1.8.0
 
| /usr/bin/java
 
| align="center" |
 
| align="center" |
 
|-
 
| Python 2.7.5 (default)
 
| /usr/bin/python
 
| align="center" |
 
| align="center" |  The system-wide installation of packages is no longer supported. See below for Anaconda/miniconda install information.
 
|-
 
| R 3.5.1
 
| /usr/bin/R
 
| align="center" |
 
| align="center" |  The system-wide installation of packages is no longer supported.
 
|-
 
| Subversion (svn) 1.7
 
| /usr/bin/svn
 
| align="center" |
 
| align="center" |
 
|-
 
|}
 
 
 
:* It is usually possible to install software in your home directory.
 
:* List installed software via rpms: ''''rpm -qa''''. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]
 
 
 
=== Modules ===
 
 
 
Since this cluster is managed with OpenHPC, the [[Modules (Lmod)| Lmod Module System]] is implemented.  You can see detailed information and instructions at the linked page
 
 
 
'''Example:'''
 
To be sure you are using the environment setup for <code>cplex</code>, you would type:
 
<pre>
 
$ module avail
 
$ module load cplex
 
</pre>
 
When done, either logout and log back in or type <code>module unload cplex</code>
 
 
 
You can also ''create your own modules'' and place them in your $HOME.  For instructions, see the [[https://www.cac.cornell.edu/wiki/index.php?title=Modules_(Lmod)#Personal_Modulefiles| Lmod Module System]] page.
 
 
 
Once created, type <code>module use $HOME/path/to/personal/modulefiles</code>.  This will prepend the path to <code>$MODULEPATH</code>.  Type <code>echo $MODULEPATH</code> to confirm.
 
 
 
=== Build software from source into your home directory ($HOME) ===
 
<pre>
 
* download and extract your source
 
* cd to your extracted source directory
 
* ./configure --./configure --prefix=$HOME/appdir
 
[You need to refer to your source documentation to get the full list of options you can provide 'configure' with.]
 
* make
 
* make install
 
 
 
The binary would then be located in ~/appdir/bin.
 
* Add the following to your $HOME/.bashrc:
 
      export PATH="$HOME/appdir/bin:$PATH"
 
* Reload the .bashrc file with source ~/.bashrc. (or logout and log back in)
 
</pre>
 
 
 
===How to Install R packages in your home directory ===
 
 
 
Reference: [http://cran.r-project.org/doc/manuals/R-admin.html#Managing-libraries http://cran.r-project.org/doc/manuals/R-admin.html#Managing-libraries]
 
<source lang="bash">
 
************************************************************************************
 
NOTE: Steps 1) through 4) need to be done once, after your Rlibs directory
 
has been created and your R_LIBS environment is set, you can install additional
 
packages using step 5).
 
************************************************************************************
 
 
 
Know your R library search path:
 
    Start R and run .libPaths()  Sample output is shown below:
 
    > .libPaths()
 
    [1] "/usr/lib64/R/library"
 
 
 
Now we will create a local Rlibs directory and add this to the library search path.
 
NOTE: Make sure R is NOT running before you proceed.
 
 
 
1) Create a directory in your home directory you would like to install the R packages, e.g. Rlibs
 
mkdir  ~/Rlibs
 
 
 
2) Create a .profile file in your home directory (or modify existing) using your favorite editor (emacs, vim, nano, etc) 
 
 
 
    Add the following to your .profile
 
    #!/bin/sh
 
    if [ -n $R_LIBS ]; then
 
        export R_LIBS=~/Rlibs:$R_LIBS
 
    else
 
        export R_LIBS=~/Rlibs
 
    fi
 
 
 
3) To reset the R_LIBS path we need to run the following: "source ~/.profile" (or logout and log back in)
 
 
 
4) Confirm the change is in your library path:
 
    start R
 
> .libPaths()
 
[1] "$HOME/Rlibs"   
 
[2] "/usr/lib64/R/library" 
 
 
 
 
 
5) Install the package in your local directory
 
>install.packages("packagename","~/Rlibs","https://cran.r-project.org/")
 
i.e. to install the package:snow
 
>install.packages("snow","~/Rlibs","https://cran.r-project.org/")
 
 
 
6) For more help with install.packages() use
 
>?install.packages( ) 
 
 
 
7) To see which libraries are available in your R library path, run library()
 
The output will show your local packages and the system wide packages
 
>library()
 
 
 
</source>
 
 
 
===How to Install Python Anaconda (miniconda) home directory ===
 
* Anaconda can be used to maintain custom environments for R (as well as other software).
 
* Reference to help decide if miniconda is enough: https://conda.io/docs/user-guide/install/download.html
 
** NOTE: Consider starting with miniconda if you do not need a multitude of packages for it will be smaller, faster to install as well as update.
 
* Reference for Anaconda R Essentials: https://conda.io/docs/user-guide/tasks/use-r-with-conda.html
 
* Reference for linux install: https://conda.io/docs/user-guide/install/linux.html
 
* Please take the tutorials to assist you with your management of conda packages:
 
https://conda.io/docs/user-guide/tutorials/index.html
 

Latest revision as of 12:28, 3 June 2019

A page for testing edits.