Difference between revisions of "Red Cloud Windows GPU instances"
Line 1: | Line 1: | ||
− | This page describes how to setup and install CUDA enabled pytorch on a Red Cloud GPU instance running Windows. These instructions were tested on a Windows Server 2019 base image with the `c4.t1.m20` Red Cloud VM flavor and with the Windows Server 2016 base image with the `c14.g1.m60` Red Cloud VM flavor. The boot volume should be increased above the 50GB default size --- 100GB was plenty and 80GB might have been sufficient. | + | This page describes how to setup and install CUDA-enabled pytorch on a Red Cloud GPU instance running Windows. These instructions were tested on a Windows Server 2019 base image with the `c4.t1.m20` Red Cloud VM flavor and with the Windows Server 2016 base image with the `c14.g1.m60` Red Cloud VM flavor. The boot volume should be increased above the 50GB default size --- 100GB was plenty and 80GB might have been sufficient. |
Before you begin please note: | Before you begin please note: |
Revision as of 20:01, 18 November 2021
This page describes how to setup and install CUDA-enabled pytorch on a Red Cloud GPU instance running Windows. These instructions were tested on a Windows Server 2019 base image with the `c4.t1.m20` Red Cloud VM flavor and with the Windows Server 2016 base image with the `c14.g1.m60` Red Cloud VM flavor. The boot volume should be increased above the 50GB default size --- 100GB was plenty and 80GB might have been sufficient.
Before you begin please note:
1. This procedure requires an account with administrator privileges.
2. At the reboot steps, use the Horizon web interface to POWER OFF the instance and power it on again to guarantee that the memory is completely cleared. A simple reboot is not always sufficient.
1. Install Visual Studio 2019 (Optional)
This is optional but avoids the installation warnings during the CUDA step. The CUDA installer wants Visual Studio 2019 (v. 2022 does not satisfy the check). Location the installer at docs.microsoft.com - Visual Studio 2019.
2. Install the NVIDIA Drivers for the GPU
Install the NVIDIA Drivers from [1] Select the following options depending on the instance flavor:
c4.t1.m20 | c14.g1.m60 | |
Product Type: | Data Center / Tesla | Data Center / Tesla |
Product Series: | T-Series | V-Series |
Product: | Tesla T4 | Tesla V100 |
Operating System: | Windows Server 2016 or 2019, (match instance OS) | Windows Server 2016 or 2019, (match instance OS) |
CUDA Toolkit: | 11.4 | 11.4 |
Language: | English (US) | English (US) |
Once the installer has downloaded, run the installer. The "express" option, which installs all available items, seems to work.
3. Restart and power cycle the instance
Begin by rebooting the virtual machine using the operating system on the VM. Allow any boot-time installations to complete. Once the machine is fully rebooted, use the Red Cloud web console (Horizon) to shut down the instance. When this step has completed, use the web console to power on the instance.
4. Install the CUDA Drivers and tools
Install the CUDA drive and tools. [2] These instructions were tested with CUDA Toolkit 11.5.
Operating System: | Windows |
Architecture: | x86_64 |
Version: | Server 2016 or Server 2019 (match instance OS) |
Installer Type | exe (network) |
Once the installer has downloaded, run the installer. The "express" option, which installs all available items, seems to work. If the installer complains about not finding Visual Studio, select the option to proceed anyway.
6. Allow access to the GPU performance counters (probably optional)
Allow "access to the GPU performance counters to all users" in the NVIDIA Control Panel (accessible through Control Panel > Hardware: NVIDIA Control Panel). Select the "Manage GPU Performance Counters" in the sidebar. This step might make the GPU usage more accessible to non-admin processes.
7. Install Miniconda
Download the latest Miniconda3 installer for Windows 64-bit and run the installer.
At this point, open the newly installed Anaconda prompt and run `nvidia-smi`. This should show GPU information and this means the driver and CUDA installation was successful.
8. Create a conda environment and install packages
The strict channel priority and use of an environment (called cforge
in these directions) helps prevent the base install from corruption. The mamba
package installer is a faster drop-in replacement for the conda
installer.
There is a chance that you will need to search for the current version numbers of pytorch below (The version numbers are correct as of 2021-11-18). Specifying the exact version numbers for pytorch is a work-around for mamba, which will otherwise install the cpu-only pytorch. Note that we install python 3.9 in the steps below and CUDA 11.5 in the steps above so we need a pytorch binary that matches our python version (3.9) uses CUDA Version 11.x where 11.x is <= 11.5. The following command will list all available versions.
$ mamba search pytorch -c pytorch
To setup an environment and install the pytorch packages, enter the following in the the Anaconda Prompt, one input line at a time. The $
represents the command prompt (and the start of an input line) and should not be typed.
$ conda config --set channel_priority strict $ conda create --name cforge $ conda activate cforge $ conda config --add channels conda-forge $ conda config --env --set channel_priority strict
Next, install python and mamba into the newly created (and active) environment.
$ conda install python=3.9 mamba pip
The next set of packages install a fairly complete data science setup. These are optional but you will probably want them!
$ mamba install pandas scikit-learn matplotlib jupyterlab plotnine nodejs tqdm regex dask scikit-learn statsmodels bokeh networkx ipywidgets jupytext
Finally, install pytorch, the Huggingface transformers (optional) and fast.ai (optional). If needed, use mamba search pytorch -c pytorch
to identify available packages and version numbers as described above.
$ mamba install -c pytorch pytorch==1.10.0=py3.9_cuda11.3_cudnn8_0 $ mamba install -c pytorch torchvision torchaudio $ mamba install transformers $ mamba install -c fastchan fastai
9. Verify that Pytorch can see the GPU
In the Anaconda Prompt, with the cforge environment active, execute the following command:
$ python -c "import torch; print(torch.cuda.is_available())" <\pre>