# Install docker-cesudoapt-getinstall \apt-transport-https \ca-certificates \curl \software-properties-commoncurl-fsSLhttps://download.docker.com/linux/ubuntu/gpg|sudoapt-keyadd-sudoadd-apt-repository \"deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release-cs) \ stable"sudoapt-getupdatesudoapt-getinstalldocker-ce# Add the package repositoriescurl-s-Lhttps://nvidia.github.io/nvidia-docker/gpgkey| \sudoapt-keyadd-curl-s-Lhttps://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list| \sudotee/etc/apt/sources.list.d/nvidia-docker.listsudoapt-getupdate# Install nvidia-docker2 and reload the Docker daemon configurationsudoapt-getinstall-ynvidia-docker2sudopkill-SIGHUPdockerd# Test nvidia-smi with the latest official CUDA imagedockerrun--runtime=nvidia--rmnvidia/cudanvidia-smi
# For Kubernetes v1.8kubectlcreate-fhttps://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.8/nvidia-device-plugin.yml# For Kubernetes v1.9kubectlcreate-fhttps://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
GCE/GKE GPU 插件
该插件不需要 nvidia-docker,并且也支持 CRI 容器运行时。
# Install NVIDIA drivers on Container-Optimized OS:kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/daemonset.yaml
# Install NVIDIA drivers on Ubuntu (experimental):kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/ubuntu/daemonset.yaml
# Install the device plugin:kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.9/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
# Label your nodes with the accelerator type they have.kubectllabelnodes<node-with-k80>accelerator=nvidia-tesla-k80kubectllabelnodes<node-with-p100>accelerator=nvidia-tesla-p100
然后,在创建 Pod 时设置 Node Affinity:
apiVersion:v1kind:Podmetadata:name:cuda-vector-addspec:restartPolicy:OnFailurecontainers: - name:cuda-vector-add# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfileimage:"k8s.gcr.io/cuda-vector-add:v0.1"resources:limits:nvidia.com/gpu:1nodeSelector:accelerator:nvidia-tesla-p100# or nvidia-tesla-k80 etc.
使用 CUDA 库
NVIDIA Cuda Toolkit 和 cuDNN 等需要预先安装在所有 Node 上。为了访问 /usr/lib/nvidia-375,需要将 CUDA 库以 hostPath volume 的形式传给容器:
# Check for CUDA and try to install.if!dpkg-query-Wcuda; then# The 16.04 installer works with 16.10. curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg-i./cuda-repo-ubuntu1604_8.0.61-1_amd64.debapt-getupdateapt-getinstallcuda-yfi