Kubernetes has robust support for provisioning GPU resources (currently limited to NVIDIA GPUs), which is an invaluable asset for compute-intensive tasks such as deep learning.
How to Access GPU Resources
For Kubernetes v1.8 and Above
Starting from Kubernetes v1.8, GPU support is facilitated through the DevicePlugin feature. Prior configuration includes:
Enabling the flag on kubelet/kube-apiserver/kube-controller-manager: --feature-gates="DevicePlugins=true"
Installing Nvidia drivers on all Nodes, including NVIDIA Cuda Toolkit and cuDNN
Configuring Kubelet to utilize the docker container engine (which is the default setting); other engines are not yet compatible with this feature.
# Install docker-cecurlhttps://get.docker.com|sh \&&sudosystemctl--nowenabledocker# Add the package repositoriesdistribution=$(./etc/os-release;echo $ID$VERSION_ID) \&&curl-fsSLhttps://nvidia.github.io/libnvidia-container/gpgkey|sudogpg--dearmor-o/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \&&curl-s-Lhttps://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list| \sed's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'| \sudotee/etc/apt/sources.list.d/nvidia-container-toolkit.list# Install nvidia-docker2 and reload the Docker daemon configurationsudoapt-getinstall-ynvidia-docker2sudosystemctlrestartdocker# Test nvidia-smi with the latest official CUDA imagesudodockerrun--rm--gpusallnvidia/cuda:11.6.2-base-ubuntu20.04nvidia-smi
Deploy NVDIA device plugin on your cluster:
GCE/GKE GPU Plugin
This plugin operates without the need for nvidia-docker and also supports CRI container runtime.
NVIDIA GPU Operator
The Nvidia GPU Operator simplifies the process of managing Nvidia GPUs in Kubernetes clusters.
Sample Request for nvidia.com/gpu resource
Kubernetes v1.6 and v1.7
The alpha.kubernetes.io/nvidia-gpu has been removed in v1.10; use nvidia.com/gpu in newer versions.
For Kubernetes v1.6 and v1.7, it is necessary to install Nvidia drivers on all Nodes, enable --feature-gates="Accelerators=true" on apiserver and kubelet, and ensure kubelet is configured to use docker as the container engine.
The following is how you would specify the number of GPUs using the resource name alpha.kubernetes.io/nvidia-gpu:
Note:
GPU resources must be requested within resources.limits, and resources.requests are ineffective.
Containers can request either one or multiple GPUs, but not a fraction.
GPUs cannot be shared between containers.
The assumption is that all Nodes have the same model of GPUs installed.
Handling Multiple GPU Models
If the cluster has Nodes with different GPU models, Node Affinity can be used to schedule Pods to Nodes with specific GPU models:
First, at cluster setup, label the Nodes with the appropriate GPU model:
Then, set Node Affinity when creating the Pod:
Utilizing CUDA Libraries
The NVIDIA Cuda Toolkit and cuDNN must be pre-installed on all Nodes. To access the /usr/lib/nvidia-375, pass the CUDA libraries to the container as hostPath volumes:
# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 etc.
$ kubectl create -f job.yaml
job "nvidia-smi" created
$ kubectl get job
NAME DESIRED SUCCESSFUL AGE
nvidia-smi 1 1 14m
$ kubectl get pod -a
NAME READY STATUS RESTARTS AGE
nvidia-smi-kwd2m 0/1 Completed 0 14m
$ kubectl logs nvidia-smi-kwd2m
Fri Jun 16 19:49:53 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
...
# Check for CUDA and try to install.
if ! dpkg-query -W cuda; then
# The 16.04 installer works with 16.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
apt-get update
apt-get install cuda -y
fi