GPU
最后更新于
最后更新于
Kubernetes now supports the allocation of GPU resources for containers (currently only NVIDIA GPUs), which is widely used in scenarios like deep learning.
Starting with Kubernetes v1.8, GPUs are supported through the DevicePlugin feature. Prior to use, several configurations are needed:
Enable the following feature gates on kubelet/kube-apiserver/kube-controller-manager: --feature-gates="DevicePlugins=true"
Install Nvidia drivers on all Nodes, including NVIDIA Cuda Toolkit and cuDNN
Configure Kubelet to use Docker as the container engine (which is the default setting), as other container engines do not yet support this feature
NVIDIA requires nvidia-docker.
To install nvidia-docker:
Set Docker default runtime to nvidia:
Deploy the NVIDIA device plugin:
This plugin does not require nvidia-docker and also supports CRI container runtimes.
nvidia.com/gpu
Resources
alpha.kubernetes.io/nvidia-gpu
has been deprecated in v1.10, please usenvidia.com/gpu
for newer versions.
To use GPUs in Kubernetes v1.6 and v1.7, prerequisite configurations are required:
Install Nvidia drivers on all Nodes, including NVIDIA Cuda Toolkit and cuDNN
Enable the feature gates --feature-gates="Accelerators=true"
on apiserver and kubelet
Configure Kubelet to use Docker as the container engine (the default setting), as other container engines are not yet supported
Use the resource name alpha.kubernetes.io/nvidia-gpu
to specify the number of GPUs required, for example:
Note:
GPU resources must be requested in resources.limits
, resources.requests
are not valid
Containers may request either 1 GPU or multiple GPUs, but not fractional parts of a GPU
GPUs cannot be shared among multiple containers
It is assumed by default that all Nodes are equipped with GPUs of the same model
If the Nodes in your cluster are installed with GPUs of different models, you can use Node Affinity to schedule Pods to Nodes with a specific GPU model.
First, label your Nodes with the GPU model during cluster initialization:
Then, set Node Affinity when creating a Pod:
NVIDIA Cuda Toolkit and cuDNN must be pre-installed on all Nodes. To access /usr/lib/nvidia-375
, CUDA libraries should be passed to containers as hostPath volumes:
To install CUDA:
To install cuDNN:
After installation, you can run nvidia-smi to check the status of the GPU devices:
First, visit the website , register and download cuDNN v5.1, then use the following commands to install it: