GPU

Kubernetes now supports the allocation of GPU resources for containers (currently only NVIDIA GPUs), which is widely used in scenarios like deep learning.

How to Use

Kubernetes v1.8 and Later

Starting with Kubernetes v1.8, GPUs are supported through the DevicePlugin feature. Prior to use, several configurations are needed:

  • Enable the following feature gates on kubelet/kube-apiserver/kube-controller-manager: --feature-gates="DevicePlugins=true"

  • Install Nvidia drivers on all Nodes, including NVIDIA Cuda Toolkit and cuDNN

  • Configure Kubelet to use Docker as the container engine (which is the default setting), as other container engines do not yet support this feature

NVIDIA Plugin

NVIDIA requires nvidia-docker.

To install nvidia-docker:

# Install docker-ce
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Set Docker default runtime to nvidia:

Deploy the NVIDIA device plugin:

GCE/GKE GPU Plugin

This plugin does not require nvidia-docker and also supports CRI container runtimes.

Example of Requesting nvidia.com/gpu Resources

Kubernetes v1.6 and v1.7

alpha.kubernetes.io/nvidia-gpu has been deprecated in v1.10, please use nvidia.com/gpu for newer versions.

To use GPUs in Kubernetes v1.6 and v1.7, prerequisite configurations are required:

  • Install Nvidia drivers on all Nodes, including NVIDIA Cuda Toolkit and cuDNN

  • Enable the feature gates --feature-gates="Accelerators=true" on apiserver and kubelet

  • Configure Kubelet to use Docker as the container engine (the default setting), as other container engines are not yet supported

Use the resource name alpha.kubernetes.io/nvidia-gpu to specify the number of GPUs required, for example:

Note:

  • GPU resources must be requested in resources.limits, resources.requests are not valid

  • Containers may request either 1 GPU or multiple GPUs, but not fractional parts of a GPU

  • GPUs cannot be shared among multiple containers

  • It is assumed by default that all Nodes are equipped with GPUs of the same model

Multiple GPU Models

If the Nodes in your cluster are installed with GPUs of different models, you can use Node Affinity to schedule Pods to Nodes with a specific GPU model.

First, label your Nodes with the GPU model during cluster initialization:

Then, set Node Affinity when creating a Pod:

Using CUDA Libraries

NVIDIA Cuda Toolkit and cuDNN must be pre-installed on all Nodes. To access /usr/lib/nvidia-375, CUDA libraries should be passed to containers as hostPath volumes:

Appendix: Installing CUDA

To install CUDA:

To install cuDNN:

First, visit the website https://developer.nvidia.com/cudnn, register and download cuDNN v5.1, then use the following commands to install it:

After installation, you can run nvidia-smi to check the status of the GPU devices:

Reference Documents

最后更新于