Kubernetes now supports the allocation of GPU resources for containers (currently only NVIDIA GPUs), which is widely used in scenarios like deep learning.
How to Use
Kubernetes v1.8 and Later
Starting with Kubernetes v1.8, GPUs are supported through the DevicePlugin feature. Prior to use, several configurations are needed:
Enable the following feature gates on kubelet/kube-apiserver/kube-controller-manager: --feature-gates="DevicePlugins=true"
Install Nvidia drivers on all Nodes, including NVIDIA Cuda Toolkit and cuDNN
Configure Kubelet to use Docker as the container engine (which is the default setting), as other container engines do not yet support this feature
NVIDIA Plugin
NVIDIA requires nvidia-docker.
To install nvidia-docker:
# Install docker-cesudoapt-getinstall \apt-transport-https \ca-certificates \curl \software-properties-commoncurl-fsSLhttps://download.docker.com/linux/ubuntu/gpg|sudoapt-keyadd-sudoadd-apt-repository \"deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release-cs) \ stable"sudoapt-getupdatesudoapt-getinstalldocker-ce# Add the package repositoriescurl-s-Lhttps://nvidia.github.io/nvidia-docker/gpgkey| \sudoapt-keyadd-curl-s-Lhttps://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list| \sudotee/etc/apt/sources.list.d/nvidia-docker.listsudoapt-getupdate# Install nvidia-docker2 and reload the Docker daemon configurationsudoapt-getinstall-ynvidia-docker2sudopkill-SIGHUPdockerd# Test nvidia-smi with the latest official CUDA imagedockerrun--runtime=nvidia--rmnvidia/cudanvidia-smi
# For Kubernetes v1.8kubectlcreate-fhttps://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.8/nvidia-device-plugin.yml# For Kubernetes v1.9kubectlcreate-fhttps://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
GCE/GKE GPU Plugin
This plugin does not require nvidia-docker and also supports CRI container runtimes.
# Install NVIDIA drivers on Container-Optimized OS:kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/daemonset.yaml
# Install NVIDIA drivers on Ubuntu (experimental):kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/ubuntu/daemonset.yaml
# Install the device plugin:kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.9/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
GPU resources must be requested in resources.limits, resources.requests are not valid
Containers may request either 1 GPU or multiple GPUs, but not fractional parts of a GPU
GPUs cannot be shared among multiple containers
It is assumed by default that all Nodes are equipped with GPUs of the same model
Multiple GPU Models
If the Nodes in your cluster are installed with GPUs of different models, you can use Node Affinity to schedule Pods to Nodes with a specific GPU model.
First, label your Nodes with the GPU model during cluster initialization:
# Label your nodes with the accelerator type they have.kubectllabelnodes<node-with-k80>accelerator=nvidia-tesla-k80kubectllabelnodes<node-with-p100>accelerator=nvidia-tesla-p100
Then, set Node Affinity when creating a Pod:
apiVersion:v1kind:Podmetadata:name:cuda-vector-addspec:restartPolicy:OnFailurecontainers: - name:cuda-vector-add# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfileimage:"k8s.gcr.io/cuda-vector-add:v0.1"resources:limits:nvidia.com/gpu:1nodeSelector:accelerator:nvidia-tesla-p100# or nvidia-tesla-k80 etc.
Using CUDA Libraries
NVIDIA Cuda Toolkit and cuDNN must be pre-installed on all Nodes. To access /usr/lib/nvidia-375, CUDA libraries should be passed to containers as hostPath volumes:
# Check for CUDA and try to install.if!dpkg-query-Wcuda; then# The 16.04 installer works with 16.10. curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg-i./cuda-repo-ubuntu1604_8.0.61-1_amd64.debapt-getupdateapt-getinstallcuda-yfi
To install cuDNN:
First, visit the website https://developer.nvidia.com/cudnn, register and download cuDNN v5.1, then use the following commands to install it: