AIDA Cluster

From CAC Documentation wiki
Jump to navigation Jump to search

AIDA General Information

  • Aida is a private cluster with restricted access to members of bs54_0001 and rab38_0001 groups.
  • Head node: aida.cac.cornell.edu (access via ssh)
  • 12 GPU nodes
    • 6 with V100 GPUs (c0017-c0022)
    • 6 with A100 GPUs (c0071-c0076)
  • many nodes from the former Atlas2 cluster

Hardware

  • All GPU nodes support vector extensions up to AVX-512
  • All nodes have hyperthreading turned on.

c00[17-22]:

    2x18 core Intel Xeon Skylake 6154 CPUs with base clock 3GHz (turbo up to 3.7GHz)

c0017: x5 GPU/Nvidia Tesla V100 16GB

    Memory: 754GB
    swap: 187GB
    /tmp: 700GB

c00[18-21]: x5 GPU/Nvidia Tesla V100 16GB

     Memory: 376GB
     swap: 187GB
     /tmp: 700GB

c0022: x2 GPU/Nvidia Tesla V100 16GB

     Memory: 1.5TB 
     swap: 187GB
     /tmp: 100GB
     /scratch: 1TB

c00[71-76]:

    2x28 core Intel Xeon Ice Lake Gold 6348 CPUs with base clock 2.6GHz
    x4 GPU/Nvidia Tesla A100 80GB
    Memory: 1TB
    swap: 187GB
    /tmp: 3TB

Networking

  • 12 GPU nodes have Infiniband
  • older Atlas2 nodes have gigabit ethernet

File Systems

Home Directories
  • Path: ~
  • User's home directories are located on a NFS export from the AIDA head node. Use your home directory (~) for archiving the data you wish to keep. Data in user's home directories are NOT backed up.

Scheduler/Queues

  • The cluster scheduler is Slurm.
  • See Slurm documentation page for details.
  • See the Requesting GPUs section for information on how to request GPUs on compute nodes for your jobs.
    1. --gres=gpu:2g.20gb:<number of MIG devices> or --gres=gpu:1g.10gb:1 to request MIG devices.
  • The job will land on one of the A100 nodes with MIG configured.
    1. --gres=gpu:a100:<number of GPUs> to request entire A100 GPUs. The job will land on an A100 node with no MIG.
  • Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.
  • You can ensure that your MPI tasks uses the full physical core by specifying -c 2
  • Partitions (queues):
Name Description Time Limit
normal xxxxxxxxxx no limit

Help

  • Submit questions or requests at help or by sending email to: help@cac.cornell.edu. Please include AIDA in the subject area.