Difference between revisions of "AIDA Cluster"

From CAC Documentation wiki
Jump to navigation Jump to search
m
Line 3: Line 3:
 
:* The aida queue contains 6 compute nodes (c0017-c0022)
 
:* The aida queue contains 6 compute nodes (c0017-c0022)
 
:* Head node:  '''atlas2.cac.cornell.edu'''  (access via ssh)
 
:* Head node:  '''atlas2.cac.cornell.edu'''  (access via ssh)
:** [https://www.cac.cornell.edu/wiki/index.php?title=ATLAS2_Cluster  Full atlas2 cluster information]
+
:** [[ATLAS2 Cluster | Full atlas2 cluster information]]
 
:** Open HPC deployment running Centos 7.4.1708
 
:** Open HPC deployment running Centos 7.4.1708
 
:** Cluster scheduler: slurm 17.02
 
:** Cluster scheduler: slurm 17.02

Revision as of 16:18, 4 January 2019

AIDA General Information

  • Aida is a private queue from within the atlas2 cluster with restricted access to the rab38_0001 group.
  • The aida queue contains 6 compute nodes (c0017-c0022)
  • Head node: atlas2.cac.cornell.edu (access via ssh)
    • Full atlas2 cluster information
    • Open HPC deployment running Centos 7.4.1708
    • Cluster scheduler: slurm 17.02
    • /home (15TB) directory server (nfs exported to all cluster nodes)
    • Intel(R) Xeon(R) E5-2637 v4 @ 3.5GHz; supports vector extensions up to AVX2
  • Cluster Status: Ganglia.
  • Please send any questions and report problems to: cac-help@cornell.edu

Networking

  • All nodes have a 10GB ethernet connection for eth0 on a private net served out from the atlas2 head node.
  • All nodes have an Infiniband connection:
  • Type: MT4119 ((EDR speed, 25Gbits/sec)

Hardware in the slurm aida partition

  • All nodes below are a Xeon generations that supports vector extensions up to AVX-512
  • All nodes below have hyperthreading is turned on.

c00[17-22] 2x18 core Xeon Skylake 6154 CPUs with base clock 3GHz (turbo up to 3.7GHz)

c0017: x5 GPU/Nvidia Tesla V100 16GB

    Memory: 754GB
    swap: 187GB
    /tmp: 700GB

c00[18-21]: x5 GPU/Nvidia Tesla V100 16GB

     Memory: 376GB
     swap: 187GB
     /tmp: 700GB

c0022: x2 GPU/Nvidia Tesla V100 16GB

     Memory: 1.5TB 
     swap: 187GB
     /tmp: 100GB
     /scratch: 1TB

Queues/Partitions ("Partition" is the term used by slurm)

To access the aida nodes, one needs to first login to atlas2.cac.cornell.edu.

  • aida
Number of nodes: 6 GPU servers
Node Names: c00[17-22]
HW: See above; c0017, c0018-c0021, c0022
Memory per node: see above
/tmp per node: see above
Limits: walltime limit: Currently unlimited

Common Slurm Commands

Slurm Quickstart Guide

Command/Option Summary (two page PDF)