Difference between revisions of "AIDA Cluster"
Jump to navigation Jump to search
|Line 41:||Line 41:|
:* Path: ~
:* Path: ~
User home directories
Userhome directories located on a NFS export from the head node. Use your home directory (~) for archiving the data you wish to keep. Data in user's home directories '''are NOT backed up'''.
Revision as of 07:14, 22 September 2022
AIDA General Information
- Aida is a private cluster with restricted access to members of bs54_0001 and rab38_0001 groups.
- Head node: aida.cac.cornell.edu (access via ssh)
- 12 GPU nodes
- 6 with V100 GPUs (c0017-c0022)
- 6 with A100 GPUs (c0071-c0076)
- many nodes from the former Atlas2 cluster
- All GPU nodes support vector extensions up to AVX-512
- All nodes have hyperthreading turned on.
2x18 core Intel Xeon Skylake 6154 CPUs with base clock 3GHz (turbo up to 3.7GHz)
c0017: x5 GPU/Nvidia Tesla V100 16GB
Memory: 754GB swap: 187GB /tmp: 700GB
c00[18-21]: x5 GPU/Nvidia Tesla V100 16GB
Memory: 376GB swap: 187GB /tmp: 700GB
c0022: x2 GPU/Nvidia Tesla V100 16GB
Memory: 1.5TB swap: 187GB /tmp: 100GB /scratch: 1TB
2x28 core Intel Xeon Ice Lake Gold 6348 CPUs with base clock 2.6GHz x4 GPU/Nvidia Tesla A100 80GB Memory: 1TB swap: 187GB /tmp: 3TB
- Path: ~
- User's home directories are located on a NFS export from the AIDA head node. Use your home directory (~) for archiving the data you wish to keep. Data in user's home directories are NOT backed up.
- The cluster scheduler is Slurm.
- See Slurm documentation page for details.
- See the Requesting GPUs section for information on how to request GPUs on compute nodes for your jobs.
--gres=gpu:2g.20gb:<number of MIG devices>or
--gres=gpu:1g.10gb:1to request MIG devices.
- The job will land on one of the A100 nodes with MIG configured.
--gres=gpu:a100:<number of GPUs>to request entire A100 GPUs. The job will land on an A100 node with no MIG.
- Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
- You can ensure that your MPI tasks uses the full physical core by specifying -c 2
- Partitions (queues):
Name Description Time Limit normal xxxxxxxxxx no limit