Difference between revisions of "Pool Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 166: Line 166:
 
  | align="center" | walltime limit: 168 hours (i.e. 7 days)
 
  | align="center" | walltime limit: 168 hours (i.e. 7 days)
 
  |-
 
  |-
  | '''plato''' (limited access to fe13_0001)
+
  | '''plato''' (limited access per fe13)
 
  | align="center" | 1
 
  | align="center" | 1
 
  | align="center" | c0009
 
  | align="center" | c0009

Revision as of 16:07, 15 April 2019

Getting Started

Copying your data over from icse-data

  • icse-data.cac.cornell.edu:/home/fs01 is nfs mounted on pool.cac.cornell.edu to: /mnt/migration

(Therefore you do not need to ssh to icse-data.cac to retrieve your data.)

  • To copy data, one suggestion would be to use rsync:

Upon logging into pool.cac.cornell.edu, you will see you are in your /home directory (type: pwd)

  • Make a directory to copy your data into & use rsync (Reminder: linux is case-sensitive):
   mkdir FromIcseData
   rsync -av /mnt/migration/your_user_id/  FromIcseData/

(Note: the end "/" are important in the above command - it says to copy all the contents from your old home directory into the newly created directory 'FromIcseData')

  • Another example would be if you do not want your data moving into a new directory:
   rsync -av /mnt/migration/your_user_id/  .   (use a "dot" to state copy data here in my current location)

NOTE: Once all data is copied over, we will remove the /mnt/migration mount after making an announcement.

General Information

  • pool is a private cluster with restricted access to the following groups: fe13_0001, dlk15_0001, ylj2_0001
  • Head node: pool.cac.cornell.edu (access via ssh)
    • Open HPC deployment running Centos 7.6
    • Cluster scheduler: slurm 17.11.10

How To Login

  • To get started, login to the head node pool.cac.cornell.edu via ssh.
  • If you are unfamiliar with Linux and ssh, we suggest reading the Linux Tutorial and looking into how to Connect to Linux before proceeding.
  • You will be prompted for your CAC account password

Hardware

  • There is a 1.8TB local /scratch disk on the head node only.

c00[01-28] hyperthreading ON

Node Names Memory per node Model name CPU count per node Core(s) per socket Sockets Thread(s) per core /tmp size
c000[1-5] 64GB Silicon Mechanics Rackform_R308.v6/X10DRL-i;
Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
40 10 2 2 1.5TB
c00[06-08, 25] 124GB Silicon Mechanics Rackform

R308.v6/X10DRL-i (c000[6-8]); R308.v5/X10DRL-i (c0025)

Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
40 10 2 2 1.5TB
c0009 64GB Silicon Mechanics Rackform_R308.v6/X10DRL-i;
Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz 
32 8 2 2 850GB
c00[10-16,18,26-28] 64GB Silicon Mechanics Rackform_R308.v5/X10DRL-i;
Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
40 10 2 2 1.5TB
c00[17,20] 48 GB Supermicro X8DTL;
 Intel(R) Xeon(R) CPU E5630  @ 2.53GHz
16 4 2 2 850GB
c00[19,22] 48 GB Supermicro X8DTL;
Intel(R) Xeon(R) CPU E5640  @ 2.67GHz
16 4 2 2 850GB
c0021 48 GB Supermicro X8DTL;
Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
24 10 2 2 1.5TB
c0023 48 GB Supermicro X8DTL;
Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
24 6 2 2 1.5TB
c0024 124 GB Silicon Mechanics Rackform_R308.v6/X10DRL-i;
Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
56 14 2 2 850GB

Networking

  • All nodes have a 1GB ethernet connection for eth0 on a private net served out from the pool head node.

Running Jobs

Slurm

Queues/Partitions

("Partition" is the term used by slurm for "Queues")

  • hyperthreading is turned on for ALL nodes - Slurm considers each core to consist of 2 logical CPUs
  • all partitions have a default time of 1 hour
  • pool currently has the following queues; more queues will be created once we have all nodes added:
Queue/Partition Number of nodes Node Names Limits
normal (default) 27 c00[01-08], [10-28] walltime limit: 168 hours (i.e. 7 days)
plato (limited access per fe13) 1 c0009 walltime limit: 168 hours (i.e. 7 days)
test 2 c00[15-16] walltime limit: 168 hours (i.e. 7 days)

Common Slurm Commands

Command/Option Summary (two page PDF)

Slurm HELP

Slurm Workload Manager Quick Start User Guide - this page lists all of the available Slurm commands

Slurm Workload Manager Frequently Asked Questions includes FAQs for Management, Users and Administrators

Convenient SLURM Commands has examples for getting information on jobs and controlling jobs

Slurm Workload Manager - sbatch - used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks. You can also use the srun command to launch an interactive job on the compute nodes.

# A few slurm commands to initially get familiar with:
scontrol show nodes
scontrol show partition

# To submit a batch job and an interactive job:
sbatch testjob.sh
srun -p normal --pty /bin/bash

# Job management commands:
scontrol show job <job id>
scancel <job id>
sinfo -l

Example batch job to run in the partition: normal

Example sbatch script to run a job with one task (default) in the normal partition (i.e. queue):

NOTE: All lines begining with "#SBATCH" are a directive for the scheduler to read. 
If you want the line ignored (i.e. a comment),  you must place 2 "##" at the beginning of your line.
#!/bin/bash
## -J sets the name of job
#SBATCH -J TestJob

## -p sets the partition (queue)
#SBATCH -p normal

## 10 min
#SBATCH --time=00:10:00

## sets the tasks per core (default=2 for hyperthreading: cores are oversubscribed)
## set to 1 if one task by itself is enough to keep a core busy
#SBATCH --ntasks-per-core=1 

## request 4GB per CPU or task
#SBATCH --mem-per-cpu=4GB

## define job stdout file
#SBATCH -o testnormal-%j.out

## define job stderr file
#SBATCH -e testnormal-%j.err

echo "starting at `date` on `hostname`"

# Print the Slurm job ID
echo "SLURM_JOB_ID=$SLURM_JOB_ID"

echo "hello world `hostname`"

echo "ended at `date` on `hostname`"
exit 0

Submit/Run your job:

sbatch example.sh

View your job:

scontrol show job <job_id>

Example MPI batch job to run in the partition: normal

Example sbatch script to run a job with 60 tasks in the normal partition (i.e. queue):

#!/bin/bash
## -J sets the name of job
#SBATCH -J TestJob

## -p sets the partition (queue)
#SBATCH -p normal

## 10 min
#SBATCH --time=00:10:00

## the number of slots (CPUs) to reserve
#SBATCH -n 60

## the number of nodes to use (min and max can be set separately)
#SBATCH -N 3

## typically an MPI job needs exclusive access to nodes for good load balancing
#SBATCH --exclusive

## don't worry about hyperthreading, Slurm should distribute tasks evenly
##SBATCH --ntasks-per-core=1 

## define job stdout file
#SBATCH -o testnormal-%j.out

## define job stderr file
#SBATCH -e testnormal-%j.err

echo "starting at `date` on `hostname`"

# Print Slurm job properties
echo "SLURM_JOB_ID = $SLURM_JOB_ID"
echo "SLURM_NTASKS = $SLURM_NTASKS"
echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES"
echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST"
echo "SLURM_JOB_CPUS_PER_NODE = $SLURM_JOB_CPUS_PER_NODE"

mpiexec -n $SLURM_NTASKS ./hello_mpi

echo "ended at `date` on `hostname`"
exit 0

To include or exclude specific nodes in your batch script

To include one or more nodes that you specifically want, add the following line to your batch script:

#SBATCH --nodelist=<node_names_you_want_to_include>

## e.g., to include c0006:
#SBATCH --nodelist=c0006

## to include c0006 and c0007 (also illustrates shorter syntax):
#SBATCH -w c000[6,7]

To exclude one or more nodes, add the following line to your batch script:

#SBATCH -exclude=<node_names_you_want_to_exclude>

## e.g., to avoid c0006 through c0008, and c0013:
#SBATCH -exclude=c00[06-08,13]

## to exclude c0006 (also illustrates shorter syntax):
#SBATCH -x c0006

Environment variables defined for tasks that are started with srun

If you submit a batch job in which you run the following script with "srun -n $SLURM_NTASKS", you will see how the various environment variables are defined.

#!/bin/bash
echo "Hello from `hostname`," \
"$SLURM_CPUS_ON_NODE CPUs are allocated here," \
"I am rank $SLURM_PROCID on node $SLURM_NODEID," \
"my task ID on this node is $SLURM_LOCALID"

These variables are not defined in the same useful way in the environments of tasks that are started with mpiexec or mpirun.

Use $HOME within your script rather than the full path to your home directory

In order to access files in your home directory, you should use $HOME rather than the full path . To test, you could add to your batch script:

echo "my home dir is $HOME"

Then view the output file you set in your batch script to get the result.


Copy your data to /tmp to avoid heavy I/O from your nfs mounted $HOME !!!

  • We cannot stress enough how important this is to avoid delays on the file systems.
#!/bin/bash
## -J sets the name of job
#SBATCH -J TestJob

## -p sets the partition (queue)
#SBATCH -p normal
## time is HH:MM:SS
#SBATCH --time=00:01:30
#SBATCH --cpus-per-task=15

## define job stdout file
#SBATCH -o testnormal-%j.out

## define job stderr file
#SBATCH -e testnormal-%j.err

echo "starting $SLURM_JOBID at `date` on `hostname`"
echo "my home dir is $HOME" 

## copying my data to a local tmp space on the compute node to reduce I/O
MYTMP=/tmp/$USER/$SLURM_JOB_ID
srun /usr/bin/mkdir -p $MYTMP || exit $?
echo "Copying my data over..."
srun cp -rp $SLURM_SUBMIT_DIR/mydatadir $MYTMP || exit $?

## run your job executables here...

echo "ended at `date` on `hostname`"
echo "copy your data back to your $HOME" 
srun /usr/bin/mkdir -p $SLURM_SUBMIT_DIR/newdatadir || exit $?
srun cp -rp $MYTMP $SLURM_SUBMIT_DIR/newdatadir || exit $?
## remove your data from the compute node /tmp space
srun rm -rf $MYTMP 

exit 0

Software

Installed Software

The 'lmod module' system is implemented. To list an environment you can put yourself in:

module avail 

(to get a more complete listing, type: module spider)

EXAMPLE: To be sure you are using the environment setup for gromacs, you would type:

 * module load gromacs/2019.1
 - when done, either logout and log back in or type:
 * module unload gromacs/2019.1

You can create your own modules and place them in your $HOME. Once created, type: module use $HOME/path/to/personal/modulefiles This will prepend the path to $MODULEPATH [type echo $MODULEPATH to confirm]

Reference: User Created Modules


---------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 ---------------------------
   fftw/3.3.8    hypre/2.14.0    netcdf-fortran/4.4.4    netcdf/4.6.1    phdf5/1.10.3    
   py3-mpi4py/3.0.0    py3-scipy/1.2.1    scalapack/2.0.2

---------------------------------- /opt/ohpc/pub/moduledeps/gnu8 --------------------------------
   metis/5.1.0    mpich/3.2.1    mvapich2/2.3    openblas/0.3.0    openmpi3/3.1.2 (L)   
   py2-numpy/1.15.3    py3-numpy/1.15.3    superlu/5.2.1

--------------------------------- /opt/ohpc/admin/modulefiles -------------------------------------
   spack/0.11.2 (D)

-------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------
   autotools          (L)    gnu/5.4.0         gromacs/2019.1      matlab/R2018a        prun/1.2    (L)    
   charliecloud/0.9.7        gnu7/7.3.0        intel/19.0.2.187    ohpc          (L)    singularity/3.1.0        
   cmake/3.12.2              gnu8/8.2.0 (L)    intel19             pmix/2.1.4           spack/0.11.2
   valgrind/3.13.0   vmd/1.9.3

   L:  Module is loaded

  • Software installed outside the module system:
(sortable table)
Package and Version Location Notes
gcc 4.8.5 (default) /bin/gcc
lammps 20181212 (default) /usr/bin/lmp
python 2.7.5 (default) /usr/bin/python includes: numpy
python 3.4 /usr/bin/python3; /usr/bin/python3.4 includes: numpy
python 3.6 /usr/bin/python3.6 includes: numpy


  • It is usually possible to install software in your home directory.
  • List installed software via rpms: 'rpm -qa'. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]

Build software from source into your home directory ($HOME)

* download and extract your source
* cd to your extracted source directory
* ./configure --./configure --prefix=$HOME/appdir
[You need to refer to your source documentation to get the full list of options you can provide 'configure' with.]
* make
* make install

The binary would then be located in ~/appdir/bin. 
* Add the following to your $HOME/.bashrc: 
      export PATH="$HOME/appdir/bin:$PATH"
* Reload the .bashrc file with source ~/.bashrc. (or logout and log back in)