Difference between revisions of "Pool Cluster"
Line 199: | Line 199: | ||
Example sbatch script to run a job in the normal partition (i.e. queue): | Example sbatch script to run a job in the normal partition (i.e. queue): | ||
NOTE: All lines begining with "#SBATCH" are a directive for the scheduler to read. | NOTE: All lines begining with "#SBATCH" are a directive for the scheduler to read. | ||
− | If you want the line ignored (i.e. a comment), you must place 2 | + | If you want the line ignored (i.e. a comment), you must place 2 "##" at the beginning of your line. |
<pre> | <pre> |
Revision as of 11:35, 4 April 2019
Getting Started
Copying your data over from icse-data
- icse-data.cac.cornell.edu:/home/fs01 is nfs mounted on pool.cac.cornell.edu to: /mnt/migration
(Therefore you do not need to ssh to icse-data.cac to retrieve your data.)
- To copy data, one suggestion would be to use rsync:
Upon logging into pool.cac.cornell.edu, you will see you are in your /home directory (type: pwd)
- Make a directory to copy your data into & use rsync (Reminder: linux is case-sensitive):
mkdir FromIcseData rsync -av /mnt/migration/your_user_id/ FromIcseData/
(Note: the end "/" are important in the above command - it says to copy all the contents from your old home directory into the newly created directory 'FromIcseData')
- Another example would be if you do not want your data moving into a new directory:
rsync -av /mnt/migration/your_user_id/ . (use a "dot" to state copy data here in my current location)
NOTE: Once all data is copied over, we will remove the /mnt/migration mount after making an announcement.
General Information
- pool is a private cluster with restricted access to the following groups: fe13_0001, dlk15_0001, ylj2_0001
- Head node: pool.cac.cornell.edu (access via ssh)
- Open HPC deployment running Centos 7.6
- Cluster scheduler: slurm 17.11.10
-
- 28 compute nodes c00[01-28]
- Current Cluster Status: Ganglia.
- data on the pool cluster is NOT backed up
- Please send any questions and report problems to: cac-help@cornell.edu
How To Login
- To get started, login to the head node pool.cac.cornell.edu via ssh.
- If you are unfamiliar with Linux and ssh, we suggest reading the Linux Tutorial and looking into how to Connect to Linux before proceeding.
- You will be prompted for your CAC account password
Hardware
- There is a 1.8TB local /scratch disk on the head node only.
c00[01-28] hyperthreading ON
Node Names Memory per node Model name Processor count per node Core(s) per socket Sockets Thread(s) per core /tmp size c000[1-5] 64GB Silicon Mechanics Rackform_R308.v6/X10DRL-i; Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
40 10 2 2 1.5TB c00[06-08, 25] 124GB Silicon Mechanics Rackform R308.v6/X10DRL-i (c000[6-8]); R308.v5/X10DRL-i (c0025)
Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
40 10 2 2 1.5TB c0009 64GB Silicon Mechanics Rackform_R308.v6/X10DRL-i; Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
32 8 2 2 850GB c00[10-16,18,26-28] 64GB Silicon Mechanics Rackform_R308.v5/X10DRL-i; Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
40 10 2 2 1.5TB c00[17,20] 48 GB Supermicro X8DTL; Intel(R) Xeon(R) CPU E5630 @ 2.53GHz
16 4 2 2 850GB c00[19,22] 48 GB Supermicro X8DTL; Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
16 4 2 2 850GB c0021 48 GB Supermicro X8DTL; Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
24 10 2 2 1.5TB c0023 48 GB Supermicro X8DTL; Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
24 6 2 2 1.5TB c0024 124 GB Silicon Mechanics Rackform_R308.v6/X10DRL-i; Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
56 14 2 2 850GB
Networking
- All nodes have a 1GB ethernet connection for eth0 on a private net served out from the pool head node.
Running Jobs
Slurm
Queues/Partitions
("Partition" is the term used by slurm for "Queues")
- hyperthreading is turned on for ALL nodes
- all partitions have a default time of 1 hour
- pool currently has only 1 shared queue; this will be divided by group once we have all nodes on:
Queue/Partition Number of nodes Node Names Limits normal (default) 28 c00[01-28] walltime limit: 168 hours (i.e. 7 days)
Common Slurm Commands
Command/Option Summary (two page PDF)
Slurm HELP
Slurm Workload Manager Quick Start User Guide - this page lists all of the available Slurm commands
Slurm Workload Manager Frequently Asked Questions includes FAQs for Management, Users and Administrators
Convenient SLURM Commands has examples for getting information on jobs and controlling jobs
Slurm Workload Manager - sbatch - used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
A few slurm commands to initially get familiar with: scontrol show nodes scontrol show partition Submit a job: sbatch testjob.sh Interactive Job: srun -p normal --pty /bin/bash scontrol show job [job id] scancel [job id] sinfo -l
Example Batch job to run in the partition: normal
Example sbatch script to run a job in the normal partition (i.e. queue):
NOTE: All lines begining with "#SBATCH" are a directive for the scheduler to read. If you want the line ignored (i.e. a comment), you must place 2 "##" at the beginning of your line.
#!/bin/bash ## J sets the name of job #SBATCH -J TestJob ## -p sets the partition (queue) #SBATCH -p normal ## 10 min #SBATCH --time=00:10:00 ## sets the tasks per core (default=2; keep default if you want to take advantage of hyperthreading) ## 2 will take whole cores, but will divide by 2 with hyperthreading #SBATCH --ntasks-per-core=1 ## request 300MB per core #SBATCH --mem-per-cpu=4GB ## define jobs stdout file #SBATCH -o testnormal-%j.out ## define jobs stderr file #SBATCH -e testnormal-%j.err echo "starting at `date` on `hostname`" # Print the SLURM job ID. echo "SLURM_JOBID=$SLURM_JOBID" echo "hello world `hostname`" echo "ended at `date` on `hostname`" exit 0
Submit/Run your job:
sbatch example.sh
View your job:
scontrol show job [job_id]
To request a specific node in your batch script
Add the following line to your batch script:
#SBATCH -w, --nodelist=node_name_you_want_to_run_on i.e. To run on c0006: #SBATCH -w, --nodelist=c0006
Software
Installed Software
The 'lmod module' system is implemented. To list an environment you can put yourself in:
module avail
(to get a more complete listing, type: module spider)
EXAMPLE: To be sure you are using the environment setup for gromacs, you would type:
* module load gromacs/2019.1 - when done, either logout and log back in or type: * module unload gromacs/2019.1
You can create your own modules and place them in your $HOME. Once created, type: module use $HOME/path/to/personal/modulefiles This will prepend the path to $MODULEPATH [type echo $MODULEPATH to confirm]
Reference: User Created Modules
---------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------------- fftw/3.3.8 hypre/2.14.0 netcdf-fortran/4.4.4 netcdf/4.6.1 phdf5/1.10.3 py3-mpi4py/3.0.0 py3-scipy/1.2.1 scalapack/2.0.2 ---------------------------------- /opt/ohpc/pub/moduledeps/gnu8 -------------------------------- metis/5.1.0 mpich/3.2.1 mvapich2/2.3 openblas/0.3.0 openmpi3/3.1.2 (L) py2-numpy/1.15.3 py3-numpy/1.15.3 superlu/5.2.1 --------------------------------- /opt/ohpc/admin/modulefiles ------------------------------------- spack/0.11.2 (D) -------------------------------------- /opt/ohpc/pub/modulefiles ---------------------------------- autotools (L) gnu/5.4.0 gromacs/2019.1 matlab/R2018a prun/1.2 (L) charliecloud/0.9.7 gnu7/7.3.0 intel/19.0.2.187 ohpc (L) singularity/3.1.0 cmake/3.12.2 gnu8/8.2.0 (L) intel19 pmix/2.1.4 spack/0.11.2 valgrind/3.13.0 vmd/1.9.3 L: Module is loaded
- Software installed outside the module system:
(sortable table)
Package and Version Location Notes gcc 4.8.5 (default) /bin/gcc lammps 20181212 (default) /usr/bin/lmp python 2.7.5 (default) /usr/bin/python includes: numpy python 3.4 /usr/bin/python3; /usr/bin/python3.4 includes: numpy python 3.6 /usr/bin/python3.6 includes: numpy
- It is usually possible to install software in your home directory.
- List installed software via rpms: 'rpm -qa'. Use grep to search for specific software: rpm -qa | grep sw_name [i.e. rpm -qa | grep perl ]
Build software from source into your home directory ($HOME)
* download and extract your source * cd to your extracted source directory * ./configure --./configure --prefix=$HOME/appdir [You need to refer to your source documentation to get the full list of options you can provide 'configure' with.] * make * make install The binary would then be located in ~/appdir/bin. * Add the following to your $HOME/.bashrc: export PATH="$HOME/appdir/bin:$PATH" * Reload the .bashrc file with source ~/.bashrc. (or logout and log back in)