AIDA Cluster

From CAC Documentation wiki
Jump to navigation Jump to search

AIDA General Information

  • Aida is a private cluster with restricted access to members of bs54_0001 and rab38_0001 groups.
  • Head node: aida.cac.cornell.edu (access via ssh)
  • 12 GPU nodes
    • 6 with V100 GPUs (c0017-c0022)
    • 6 with A100 GPUs (c0071-c0076)
  • many nodes from the former Atlas2 cluster

Hardware

  • All GPU nodes support vector extensions up to AVX-512
  • All nodes have hyperthreading turned on.

c00[17-22]:

    2x18 core Intel Xeon Skylake 6154 CPUs with base clock 3GHz (turbo up to 3.7GHz)

c0017: x5 GPU/Nvidia Tesla V100 16GB

    Memory: 754GB
    swap: 187GB
    /tmp: 700GB

c00[18-21]: x5 GPU/Nvidia Tesla V100 16GB

     Memory: 376GB
     swap: 187GB
     /tmp: 700GB

c0022: x2 GPU/Nvidia Tesla V100 16GB

     Memory: 1.5TB 
     swap: 187GB
     /tmp: 100GB
     /scratch: 1TB

c00[71-76]:

    2x28 core Intel Xeon Ice Lake Gold 6348 CPUs with base clock 2.6GHz
    x4 GPU/Nvidia Tesla A100 80GB
    Memory: 1TB
    swap: 187GB
    /tmp: 3TB

Networking

File Systems

Scheduler/Queues

  • The cluster scheduler is Slurm.
  • Partitions

See Slurm documentation page for details. See the Requesting GPUs section for information on how to request GPUs on compute nodes for your jobs.

    1. --gres=gpu:2g.20gb:<number of MIG devices> or --gres=gpu:1g.10gb:1 to request MIG devices. The job will land on one of c0002, c0003, or c0004.
    2. --gres=gpu:a100:<number of GPUs> to request entire A100 GPUs. The job will land on node c0001.
  • Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
  • Partitions (queues):
Name Description Time Limit
normal all nodes, each node with 4 Nvidia A100 GPUs no limit

Help

  • Submit questions or requests at help or by sending email to: help@cac.cornell.edu. Please include AIDA in the subject area.