Difference between revisions of "AIDA Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
== AIDA General Information ==
 
== AIDA General Information ==
:* Aida is a private queue from within the atlas2 cluster with restricted access to the rab38_0001 group.
+
:* Aida is a private cluster with restricted access to members of bs54_0001 and rab38_0001 groups.
:* The aida queue contains 6 compute nodes (c0017-c0022)
+
:* 6 compute nodes with V100 GPUs (c0017-c0022)
:* The aida2 queue contains 5 compute nodes (c0071,73-76)
+
:* 6 compute nodes with A100 GPUs (c0071-c0076)
:* Head node:  '''atlas2.cac.cornell.edu'''  (access via ssh)
+
:*  many nodes from the former Atlas2 cluster
:** [[ATLAS2 Cluster | Full atlas2 cluster information]]
+
:* Head node:  '''aida.cac.cornell.edu'''  (access via ssh)
:** Open HPC deployment running Centos 7.4.1708
+
 
:** Cluster scheduler: slurm 20.11.8
 
:** /home (15TB) directory server (nfs exported to all cluster nodes)
 
:** Intel(R) Xeon(R) E5-2637 v4 @ 3.5GHz; supports vector extensions up to AVX2
 
:* Cluster Status: [http://atlas2.cac.cornell.edu/ganglia/ Ganglia].
 
 
:* Please send any questions and report problems to:  [mailto:cac-help@cornell.edu cac-help@cornell.edu]  
 
:* Please send any questions and report problems to:  [mailto:cac-help@cornell.edu cac-help@cornell.edu]  
  
 
==== Networking ====
 
==== Networking ====
:* All nodes have a 10GB ethernet connection for eth0 on a private net served out from the atlas2 head node.
 
:* All nodes have an Infiniband connection:
 
::*  Type: MT4119 ((EDR speed, 25Gbits/sec)
 
  
==== Hardware in the slurm aida partition ====
+
 
 +
==== Hardware ====
  
 
:* All nodes below are a Xeon generations that supports vector extensions up to AVX-512
 
:* All nodes below are a Xeon generations that supports vector extensions up to AVX-512
Line 39: Line 33:
 
       /scratch: 1TB
 
       /scratch: 1TB
  
c00[71,73-76]:
+
c00[71-76]:
 
     2x28 core Xeon Gold 6348 CPUs with base clock 2.6GHz
 
     2x28 core Xeon Gold 6348 CPUs with base clock 2.6GHz
 
     x4 GPU/Nvidia Tesla A100 80GB
 
     x4 GPU/Nvidia Tesla A100 80GB
Line 46: Line 40:
 
     /tmp: 3TB
 
     /tmp: 3TB
  
==Queues/Partitions ("Partition" is the term used by Slurm) ==
+
== Partitions ==
  
 
'''For detailed information and a quick-start guide, see the [[Slurm]] page.'''
 
'''For detailed information and a quick-start guide, see the [[Slurm]] page.'''
 
To access the aida nodes, one needs to first login to atlas2.cac.cornell.edu. 
 
 
:*'''aida'''
 
::: Number of nodes: 6 GPU servers (Tesla V100)
 
::: Node Names: c00[17-22]
 
::: HW: See above; c0017, c0018-c0021, c0022
 
::: Memory per node: see above
 
::: /tmp per node: see above
 
::: Default walltime: 1 hour
 
::: Limits: walltime limit: 7 days
 
 
:*'''aida2'''
 
::: Number of nodes: 5 GPU servers (Tesla A100)
 
::: Node Names: c00[71,73-76]
 
::: HW: See above; c0071, c0073-c0076
 
::: Memory per node: 1TB
 
::: /tmp per node: 3TB
 
::: Default walltime: 1 hour
 
::: Limits: walltime limit: 7 days
 

Revision as of 07:44, 21 September 2022

AIDA General Information

  • Aida is a private cluster with restricted access to members of bs54_0001 and rab38_0001 groups.
  • 6 compute nodes with V100 GPUs (c0017-c0022)
  • 6 compute nodes with A100 GPUs (c0071-c0076)
  • many nodes from the former Atlas2 cluster
  • Head node: aida.cac.cornell.edu (access via ssh)

Networking

Hardware

  • All nodes below are a Xeon generations that supports vector extensions up to AVX-512
  • All nodes below have hyperthreading is turned on.

c00[17-22]:

    2x18 core Xeon Skylake 6154 CPUs with base clock 3GHz (turbo up to 3.7GHz)

c0017: x5 GPU/Nvidia Tesla V100 16GB

    Memory: 754GB
    swap: 187GB
    /tmp: 700GB

c00[18-21]: x5 GPU/Nvidia Tesla V100 16GB

     Memory: 376GB
     swap: 187GB
     /tmp: 700GB

c0022: x2 GPU/Nvidia Tesla V100 16GB

     Memory: 1.5TB 
     swap: 187GB
     /tmp: 100GB
     /scratch: 1TB

c00[71-76]:

    2x28 core Xeon Gold 6348 CPUs with base clock 2.6GHz
    x4 GPU/Nvidia Tesla A100 80GB
    Memory: 1TB
    swap: 187GB
    /tmp: 3TB

Partitions

For detailed information and a quick-start guide, see the Slurm page.