Difference between revisions of "GPUs in Red Cloud"

From CAC Documentation wiki
Jump to navigation Jump to search
 
(17 intermediate revisions by 5 users not shown)
Line 1: Line 1:
''(This page under development)''
+
[[Red Cloud]] supports GPU computing featuring '''[https://www.nvidia.com/en-us/data-center/tesla-t4/ Nvidia Tesla T4]''' and '''[https://www.nvidia.com/en-us/data-center/tesla-v100/ Nvidia Tesla V100]''' GPUs. To use a GPU, launch an instance with one of the following 2 flavors (instance types):
 
 
[[Red Cloud]] supports GPU computing and has two different GPU [[OpenStack#Instances|instance]] flavors that allow for GPU computing on virtual machines (VMs).  In order to use a GPU in Red Cloud, you will need to select a flavor of machine with  a GPU. Currently Red Cloud instances feature '''[https://www.nvidia.com/en-us/data-center/tesla-t4/ Nvidia Tesla T4]''' and '''[https://www.nvidia.com/en-us/data-center/tesla-v100/ Nvidia Tesla V100]''' GPU accelerators in these flavors:
 
  
 
{| border="1" cellspacing="0" cellpadding="10" align="center" style="text-align:center;"
 
{| border="1" cellspacing="0" cellpadding="10" align="center" style="text-align:center;"
Line 15: Line 13:
  
 
== Availability ==
 
== Availability ==
As of this writing, there are 20 of the T4s and 4 of the V100s for use in Red Cloud VMs.  You can obtain up-to-date GPU usage [https://gpus.redcloud.cac.cornell.edu/usage here] to help you to determine if resources are available to start an instance. If you are new to Red Cloud you should review [[Red_Cloud#How_To_Read_This_Documentation|how to read this documentation]] before launching an instance, especially the section on [[Red_Cloud#Accounting:_Don.27t_Use_Up_Your_Subscription_by_Accident.21|accounting]]. Because Red Cloud ''does not'' have hyperthreading enabled, GPUs are not oversubscribed.  This means when you create a GPU instance with a certain number of GPUs, you are reserving the physical hardware for the duration of the life of your instance unless it is '''''[[OpenStack#Instance_States|shelved]]''''' to free the resources. If the resources are not available when you attempt to start an instance - because someone else has reserved them - then you may receive an error that they cannot be created.  Therefore, it is good to check availability before starting an instance, and also shelving instances when not in use.
+
Red Cloud has 20 T4 GPUs and 4 V100 GPUs.  You can see how many are available for use [https://gpus.redcloud.cac.cornell.edu/usage here]. If no GPU is available, you will receive an error when launching a GPU instance.  
 +
 
 +
Red Cloud resources (CPU cores, RAM, GPUs) are not oversubscribed.  When you create a GPU instance, you are reserving the physical hardware for the duration of the life of your instance (and your subscription will be charged accordingly) until the instance is deleted or '''''[[OpenStack#Instance_States|shelved]]''''' to free the resources.  
 +
 
 +
'''If you are new to Red Cloud''' please review [[Red_Cloud#How_To_Read_This_Documentation|how to read this documentation]] before launching an instance, especially the section on [[Red_Cloud#Accounting:_Don.27t_Use_Up_Your_Subscription_by_Accident.21|accounting]].
  
 
== Launching A GPU Instance ==
 
== Launching A GPU Instance ==
When '''[[OpenStack#Launch_an_Instance|launching an instance]]''', you can use either the base [[Red_Cloud_Linux_Instances|Linux]] or [[Red_Cloud_Windows_Instances|Windows]] instances and install your own GPU libraries, or select CUDA source images such as (...), select a GPU-enabled flavor, and configure the instance as you would any other instance.
+
When '''[[OpenStack#Launch_an_Instance|launching a GPU instance]]''', you can use the base [[Red_Cloud_Linux_Instances|Linux]] or [[Red_Cloud_Windows_Instances|Windows]] image and install your own software or libraries that utilizes the GPU. To speed up time to science, CAC also provides 2 Linux GPU images with GPU software installed.
 +
 
 +
For more information on GPU and CUDA computing, see the Cornell Virtual Workshop "'''[https://cvw.cac.cornell.edu/gpu/ Introduction to GPGPU and CUDA Programming: Overview]'''"
 +
 
 +
=== GPU images ===
 +
 
 +
* [https://redcloud.cac.cornell.edu/dashboard/ngdetails/OS::Glance::Image/e096c762-473c-440b-9516-19211c255ad2 gpu-accelerated-ubuntu-2020-08] (based on Ubuntu 18.04 LTS)
 +
* [https://redcloud.cac.cornell.edu/dashboard/ngdetails/OS::Glance::Image/516f21bc-07a8-4546-b052-982028a3d04e gpu-accelerated-centos-2020-08] (based on CentOS 7.8)
  
Once your instance is launched, you will have access to the GPU within the VM and can install software (pytorch, tensorflow) that will use the GPU.
+
These images include the following software:
 +
# CUDA 10.1
 +
# Anaconda Python 3 with these packages
 +
## TensorFlow
 +
## PyTorch
 +
## Keras
 +
# Docker-containerized Jupyter Notebook servers, and
 +
# Matlab R2019a.
  
For more information on GPU and CUDA computing, see the Cornell Virtual Workshop "'''[https://cvw.cac.cornell.edu/gpu/ Introduction to GPGPU and CUDA Programming: Overview]'''"
+
See '''[[Red Cloud GPU Image Usage]]''' page for more details and sample code.

Latest revision as of 13:27, 5 November 2020

Red Cloud supports GPU computing featuring Nvidia Tesla T4 and Nvidia Tesla V100 GPUs. To use a GPU, launch an instance with one of the following 2 flavors (instance types):

Flavor CPUs GPUs RAM
c4.t1.m20 4 1 Nvidia Tesla T4 20 GB
c14.g1.m60 14 1 Nvidia Tesla V100 60 GB

Availability

Red Cloud has 20 T4 GPUs and 4 V100 GPUs. You can see how many are available for use here. If no GPU is available, you will receive an error when launching a GPU instance.

Red Cloud resources (CPU cores, RAM, GPUs) are not oversubscribed. When you create a GPU instance, you are reserving the physical hardware for the duration of the life of your instance (and your subscription will be charged accordingly) until the instance is deleted or shelved to free the resources.

If you are new to Red Cloud please review how to read this documentation before launching an instance, especially the section on accounting.

Launching A GPU Instance

When launching a GPU instance, you can use the base Linux or Windows image and install your own software or libraries that utilizes the GPU. To speed up time to science, CAC also provides 2 Linux GPU images with GPU software installed.

For more information on GPU and CUDA computing, see the Cornell Virtual Workshop "Introduction to GPGPU and CUDA Programming: Overview"

GPU images

These images include the following software:

  1. CUDA 10.1
  2. Anaconda Python 3 with these packages
    1. TensorFlow
    2. PyTorch
    3. Keras
  3. Docker-containerized Jupyter Notebook servers, and
  4. Matlab R2019a.

See Red Cloud GPU Image Usage page for more details and sample code.