Red Cloud GPU Image Usage

From CAC Documentation wiki
Revision as of 20:24, 28 August 2020 by Shl1 (talk | contribs)
Jump to navigation Jump to search

Introduction

This wiki provides documentation for creating and using a GPU instance in Red Cloud from a Ubuntu (gpu-accelerated-ubuntu-2020-08) or or CentOS-based (gpu-accelerated-centos-2020-08) GPU image. The image features:

  • GPU acceleration via CUDA 10.1.
  • Anaconda distribution to facilitate usage of platforms such as Tensorflow,
  • Docker-containerized Jupyter Notebook servers, and
  • MATLAB R2019a.

A test application using the Python neural network library Keras, which runs on top of the Tensorflow framework, is provided to check and test GPU utilization.

The intent is to get users started on using GPU accelerated software with their work quickly with minimal effort.

Create a Server Instance

  1. Get started with with the Openstack web interface
  2. Create an SSH keypair or upload your public key
  3. Create a custom Security Group if you haven't
  4. Launch a new Server Instance
    • In the Openstack web interface Dashboard, navigate to Compute > Instances
      InstancesMenu.png
    • click on “Launch Instance”.
      InstancesOptions.png
    • On the Launch Instance screen, under :
      InstanceLaunchMenuFull.png
      • Details tab: enter instance name
      • Source tab:
        • Boot Source = “Image”
        • Among the list of images, chosose “gpu-accelerated-centos-2020-08” for Centos 7.8 or “gpu-accelerated-ubuntu-2020-08” for Ubuntu 18.04 LTS by clicking on the ⬆️ button.
        • Volume Size: increase as necessary. For reference, the Anaconda distribution alone is 19GB.
      • Flavor tab: select instance server configurations. Choose between "c4.t1.m20" (Tesla T4 GPU) and "c14.g1.m60" (Tesla V100 GPU for large jobs). If you need multiple GPUs on one instance, please reach out for special accommodation.
      • Networks tab: “public” is a good default.
      • Security Groups tab: Click ⬇️ on the “default” group to deselect the default security group. Click ⬆️ to select the custom security group you created above.
      • Key Pair: Select the ssh key pair you had created above.
      • Click the "Launch Instance" button to create your server instance.
      • Note the IP Address of your newly-created Instance.

SSH Login

In a new terminal window, Chrome extension ssh or Putty, SSH into the instance:

  • For Ubuntu instance
 ssh -i ~/.ssh/id_rsa/id_rsa ubuntu@<IP_Address_from_previous_step>
  • For CentOS instance:
 ssh -i ~/.ssh/id_rsa/id_rsa centos@<IP_Address_from_previous_step>

Assumes that your private key is saved in a file named “id_rsa” at ~/.ssh/id/rsa.

CAC Recommends

Monthly security update patching:

  • For Ubuntu instance:
 sudo apt dist-upgrade
  • For CentOS instance:
 sudo yum update 

What’s Installed on the VM Instance CUDA Check version with yum info cuda (not applicable for Ubuntu) Check CUDA version currently in use with nvcc --version usr/local/cuda is a symlink. If errors, Repoint with ln -s /usr/local/cuda-10.1 /usr/local/cuda as necessary.

CUDA Driver Check version with cat /proc/driver/nvidia/version NVIDIA Drivers Check version with dkms status sudo yum install dkms if “command not found” Check detection of GPU devices by CUDA via NVIDIA’s drivers: nvidia-smi

Note that the CUDA version displayed on top right of this output is not necessarily the CUDA version currently in use. Anaconda Check overarching Anaconda version with conda list anaconda$ Check “conda” package manager version with conda -V Check list of packages installed with conda list If desired, update to latest version with conda update --all For environment information, such as base paths or Python platform versioning, conda info An Anaconda environment should already be activated upon startup, indicated by base preceding each command line prompt.

This “base” environment applies specific settings to your shell for the choice of a Python interpreter and its associated modules and libraries. which python for identifying the environment’s Python location, python -V for versioning Python. Docker To start docker using docker: sudo usermod -aG docker $(whoami) sudo service docker start To view existing images pulled onto this instance, docker images To view existing containers, both running and exited, docker ps -a To enter terminal of a running container, docker exec -it <Container Name or Container ID> bash To restart an exited container, docker start `docker ps -q -l` Starts from latest first. To exit container Terminal without exiting or killing the container, CTRL+p then CTRL+q

Testing with Sample Application: with Virtualization only Each of these frameworks can be set up in their own Conda environments. conda env list to see the list of existing usable environments. conda deactivate to exit current environment and return to “base” environment. Keras-GPU on Tensorflow-GPU Activate the Virtual Environment for Tensorflow With the “base” Anaconda environment still activated, conda create -n tf_gpu tensorflow-gpu conda activate tf_gpu Install necessary packages: conda install tensorflow-gpu keras-gpu Ensure the iPython command-line terminal used is from within the environment, not an external version: conda install -c anaconda ipython which ipython → should be ~/anaconda3/envs/tf_gpu/bin/ipython Run sample app within an iPython terminal: ipython

  1. paste in this sample code:

import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import numpy as np tf.config.list_physical_devices('GPU')


Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

PyTorch Activate the Virtual Environment for PyTorch With the “base” Anaconda environment still activated,

conda create -n pytorch conda activate pytorch Install necessary packages: conda install pytorch torchvision -c pytorch pip install pycuda Ensure the iPython command-line terminal used is from within the environment, not an external version: conda install -c anaconda ipython

Run sample app within an iPython terminal: ipython

  1. paste in this sample code:

import pycuda import pycuda.driver as drv drv.init() print('CUDA device query (PyCUDA version) \n') print('Detected {} CUDA Capable device(s) \n'.format(drv.Device.count())) for i in range(drv.Device.count()):

    gpu_device = drv.Device(i) 
    print('Device {}: {}'.format( i, gpu_device.name() ) ) 
    compute_capability = float( '%d.%d' % gpu_device.compute_capability() )  
    print('\t Compute Capability:{}'.format(compute_capability))  
    print('\t Total Memory: {} megabytes'.format(gpu_device.total_memory()//(1024**2 )))

Expected Output: Tesla T4: Flavor *t1* Tesla V100: Flavor *g1* CUDA device query (PyCUDA version) Detected 1 CUDA Capable device(s) Device 0: Tesla T4 Compute Capability:7.5 Total Memory: 15109 megabytes CUDA device query (PyCUDA version) Detected 1 CUDA Capable device(s) Device 0: Tesla V100-PCIE-16GB Compute Capability:7.0 Total Memory: 16160 megabytes

Testing with Sample Application: with Virtualization + Docker Containerization Keras GPU on Tensorflow-GPU Create a new container: docker run --gpus all -v $(realpath ~/notebooks):/tf/notebooks -p 8000:8000 tensorflow/tensorflow:latest-gpu-jupyter sleep 100000 & The --gpus all label is for employing devices as detected by nvidia-smi. tensorflow/tensorflow:latest-gpu-jupyter is the specific image used. Read more about this image and other possible tags at DockerHub: https://hub.docker.com/r/tensorflow/tensorflow/ The container runs a Jupyter Notebook server, exposed on host port 8888, from your own directory (here stated as ~/notebooks).

Test run the new container, then exit and disconnect: docker ps to check the Container ID (first column) or Name (last column) docker exec -it <Container ID or Name> bash CTRL+p and CTRL+q to exit container without killing it. CTRL+d to end SSH connection to VM.

Re-connect to the VM via SSH Tunneling, then restart the Docker Tensorflow Container: ssh -L 8000:localhost:8000 centos@<_IP_> OR ssh -L 8000:localhost:8000 ubuntu@<_IP_>

docker exec -it <Container ID or Name from step III-B> bash

Spin up a Jupyter Notebook within this Docker container: jupyter notebook --ip 0.0.0.0 --port 8000 --allow-root

Copy the token provided above for use in the next step. Note: “No web browser found” error may occur, ignore if the next step is successful. If not, Ctrl+c to stop this current server, and try again.

Navigate to http://localhost:8000/ on a local browser. Enter the token from the previous step in the requested field.

Navigate to notebooks, then create a new Python3 Notebook or use an existing one.

Paste in and run this Keras sample application: import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import numpy as np tf.config.list_physical_devices('GPU')

Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]



Outline Notes begin here ----

How to make a GPU server instance pick the flavor - *t1* for testing V100 for large runs if desiring multiple GPUs on one instance, please reach out for special accommodation reference CAC Wiki existing docs https://www.cac.cornell.edu/wiki/index.php?title=OpenStack#Launch_an_Instance reference _ to be filled in server image ids (assume software libraries installed)

How to login reference CAC Wiki existing docs https://www.cac.cornell.edu/wiki/index.php?title=Red_Cloud#How_to_Create_and_Manage_Red_Cloud_Resources

Run tests that detect and use GPUs, explain what they do, give expected output and basic intro links What is installed and how to list installed software and library versions CUDA yum info NVIDIA Drivers Anaconda conda -V conda # list packages docker images tensorflow in a chosen docker image (Matlab) ... how to detect GPUs nvidia-smi ? python on host (?conda jupyter notebook ssh -L tunnel connection) Docker Keras: both in Docker and at host level App lists devices AND runs on GPU

(Matlab) (Floydhub?)