GPU
Kubernetes 支持容器请求 GPU 资源(目前仅支持 NVIDIA GPU),在深度学习等场景中有大量应用。
使用方法
Kubernetes v1.8 及更新版本
从 Kubernetes v1.8 开始,GPU 开始以 DevicePlugin 的形式实现。在使用之前需要配置
kubelet/kube-apiserver/kube-controller-manager:
--feature-gates="DevicePlugins=true"
在所有的 Node 上安装 Nvidia 驱动,包括 NVIDIA Cuda Toolkit 和 cuDNN 等
Kubelet 配置使用 docker 容器引擎(默认就是 docker),其他容器引擎暂不支持该特性
NVIDIA 插件
NVIDIA 需要 nvidia-docker。
安装 nvidia-docker:
# Install docker-ce
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
# Test nvidia-smi with the latest official CUDA image
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
部署 NVDIA 设备插件
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.13.0/nvidia-device-plugin.yml
GCE/GKE GPU 插件
该插件不需要 nvidia-docker,并且也支持 CRI 容器运行时。
# Install NVIDIA drivers on Container-Optimized OS:
kubectl create -f https://github.com/GoogleCloudPlatform/container-engine-accelerators/raw/master/daemonset.yaml
# Install NVIDIA drivers on Ubuntu (experimental):
kubectl create -f https://github.com/GoogleCloudPlatform/container-engine-accelerators/raw/master/nvidia-driver-installer/ubuntu/daemonset.yaml
# Install the device plugin:
kubectl create -f https://github.com/kubernetes/kubernetes/raw/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
NVIDIA GPU Operator
Nvidia GPU Operator 是一个 Kubernetes Operator,用于在 Kubernetes 集群中部署和管理 Nvidia GPU。
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
请求 nvidia.com/gpu
资源示例
nvidia.com/gpu
资源示例$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
Kubernetes v1.6 和 v1.7
alpha.kubernetes.io/nvidia-gpu
已在 v1.10 中删除,新版本请使用nvidia.com/gpu
。
在 Kubernetes v1.6 和 v1.7 中使用 GPU 需要预先配置
在所有的 Node 上安装 Nvidia 驱动,包括 NVIDIA Cuda Toolkit 和 cuDNN 等
在 apiserver 和 kubelet 上开启
--feature-gates="Accelerators=true"
Kubelet 配置使用 docker 容器引擎(默认就是 docker),其他容器引擎暂不支持该特性
使用资源名 alpha.kubernetes.io/nvidia-gpu
指定请求 GPU 的个数,如
apiVersion: v1
kind: Pod
metadata:
name: tensorflow
spec:
restartPolicy: Never
containers:
- image: gcr.io/tensorflow/tensorflow:latest-gpu
name: gpu-container-1
command: ["python"]
env:
- name: LD_LIBRARY_PATH
value: /usr/lib/nvidia
args:
- -u
- -c
- from tensorflow.python.client import device_lib; print device_lib.list_local_devices()
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1 # requests one GPU
volumeMounts:
- mountPath: /usr/local/nvidia/bin
name: bin
- mountPath: /usr/lib/nvidia
name: lib
- mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so
name: libcuda-so
- mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so.1
name: libcuda-so-1
- mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so.375.66
name: libcuda-so-375-66
volumes:
- name: bin
hostPath:
path: /usr/lib/nvidia-375/bin
- name: lib
hostPath:
path: /usr/lib/nvidia-375
- name: libcuda-so
hostPath:
path: /usr/lib/x86_64-linux-gnu/libcuda.so
- name: libcuda-so-1
hostPath:
path: /usr/lib/x86_64-linux-gnu/libcuda.so.1
- name: libcuda-so-375-66
hostPath:
path: /usr/lib/x86_64-linux-gnu/libcuda.so.375.66
$ kubectl create -f pod.yaml
pod "tensorflow" created
$ kubectl logs tensorflow
...
[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9675741273569321173
, name: "/gpu:0"
device_type: "GPU"
memory_limit: 11332668621
locality {
bus_id: 1
}
incarnation: 7807115828340118187
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0"
]
注意
GPU 资源必须在
resources.limits
中请求,resources.requests
中无效容器可以请求 1 个或多个 GPU,不能只请求一部分
多个容器之间不能共享 GPU
默认假设所有 Node 安装了相同型号的 GPU
Dynamic Resource Allocation (DRA) 方式使用 GPU
从 Kubernetes v1.26 开始,可以使用 DRA 方式来管理 GPU 资源,相比传统的 Device Plugin 方式,DRA 提供了更灵活的 GPU 分配和管理能力。
DRA GPU 配置
1. 创建 GPU ResourceClass
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClass
metadata:
name: nvidia-gpu-class
spec:
driverName: gpu.nvidia.com
parameters:
# GPU 内存大小
memory: "16Gi"
# 计算能力
compute: "7.5"
# v1.33 新特性:支持 GPU 分区
partitionable: true
maxPartitions: 7 # MIG 分区数
# 支持的 CUDA 版本
cudaVersion: "12.0"
---
# 用于 AI/ML 工作负载的高性能 GPU 类
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClass
metadata:
name: high-perf-gpu-class
spec:
driverName: gpu.nvidia.com
parameters:
memory: "80Gi" # A100 GPU
compute: "8.0"
tensorCores: true
nvlink: true
---
# 共享 GPU 资源类
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClass
metadata:
name: shared-gpu-class
spec:
driverName: gpu.nvidia.com
parameters:
shared: true
maxUsers: 4
timeSlicing: true
2. 创建 GPU ResourceClaim
# 独占 GPU 资源声明
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaim
metadata:
name: exclusive-gpu-claim
namespace: ml-training
spec:
resourceClassName: nvidia-gpu-class
allocationMode: WaitForFirstConsumer
---
# v1.33 特性:优先级列表 - 尝试多种 GPU 类型
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaim
metadata:
name: flexible-gpu-claim
namespace: ml-training
spec:
# 按优先级尝试不同的 GPU 类型
resourceClassNames:
- high-perf-gpu-class # 优先使用高性能 GPU
- nvidia-gpu-class # 备选标准 GPU
- shared-gpu-class # 最后尝试共享 GPU
allocationMode: WaitForFirstConsumer
3. 使用 DRA GPU 的 Pod
# 机器学习训练任务
apiVersion: v1
kind: Pod
metadata:
name: ml-training-pod
namespace: ml-training
spec:
containers:
- name: trainer
image: tensorflow/tensorflow:latest-gpu
command: ["python", "train.py"]
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "all"
resources:
claims:
- name: gpu-resource
limits:
memory: "32Gi"
cpu: "8"
resourceClaims:
- name: gpu-resource
source:
resourceClaimName: exclusive-gpu-claim
---
# 推理服务使用共享 GPU
apiVersion: v1
kind: Pod
metadata:
name: inference-pod
namespace: ml-inference
spec:
containers:
- name: inference-server
image: tensorrt-inference:latest
ports:
- containerPort: 8080
resources:
claims:
- name: shared-gpu
limits:
memory: "4Gi"
cpu: "2"
resourceClaims:
- name: shared-gpu
source:
resourceClaimName: shared-gpu-claim
v1.33 DRA GPU 新特性
1. GPU 分区(MIG 支持)
# 支持 NVIDIA MIG 分区的 ResourceClass
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClass
metadata:
name: mig-gpu-class
spec:
driverName: gpu.nvidia.com
parameters:
# MIG 配置
migEnabled: true
migProfile: "1g.5gb" # 1/7 GPU + 5GB 内存
partitionable: true
---
# 请求 MIG 分区的 Pod
apiVersion: v1
kind: Pod
metadata:
name: mig-workload
spec:
containers:
- name: light-ml-task
image: pytorch/pytorch:latest
resources:
claims:
- name: mig-partition
resourceClaims:
- name: mig-partition
source:
resourceClaimName: mig-gpu-claim
2. GPU 污点和容忍度
# 将 GPU 标记为维护状态
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceSlice
metadata:
name: gpu-node-maintenance
spec:
driverName: gpu.nvidia.com
devices:
- name: gpu-0
basic:
capacity:
memory: "16Gi"
# GPU 污点:标记为维护状态
taints:
- key: "maintenance"
value: "scheduled"
effect: "NoSchedule"
- key: "thermal-throttling"
value: "detected"
effect: "PreferNoSchedule"
---
# 容忍 GPU 污点的 Pod
apiVersion: v1
kind: Pod
metadata:
name: maintenance-tolerant-gpu-pod
spec:
containers:
- name: monitoring-task
image: gpu-monitor:latest
resources:
claims:
- name: gpu-resource
resourceClaims:
- name: gpu-resource
source:
resourceClaimName: maintenance-gpu-claim
# 容忍 GPU 设备污点
tolerations:
- key: "resource.kubernetes.io/device.maintenance"
operator: "Equal"
value: "scheduled"
effect: "NoSchedule"
- key: "resource.kubernetes.io/device.thermal-throttling"
operator: "Equal"
value: "detected"
effect: "PreferNoSchedule"
3. 管理员访问控制
# 启用 DRA 管理访问的命名空间
apiVersion: v1
kind: Namespace
metadata:
name: gpu-admin-namespace
labels:
# v1.33 特性:管理员访问标签
resource.kubernetes.io/admin-access: "enabled"
---
# 只有管理员命名空间才能创建的 ResourceClaim
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaim
metadata:
name: admin-gpu-claim
namespace: gpu-admin-namespace
spec:
resourceClassName: high-perf-gpu-class
# 管理员级别的配置
parameters:
# 允许超额分配
overcommit: true
# 强制亲和性
requiredNodeAffinity:
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/class
operator: In
values: ["A100", "H100"]
DRA GPU 监控和调试
# GPU 使用情况监控 Pod
apiVersion: v1
kind: Pod
metadata:
name: gpu-monitor
spec:
containers:
- name: nvidia-smi-exporter
image: mindprince/nvidia_gpu_prometheus_exporter:0.1
ports:
- containerPort: 9445
name: metrics
securityContext:
capabilities:
add: ["SYS_ADMIN"]
volumeMounts:
- name: dev
mountPath: /dev
- name: proc-driver-nvidia
mountPath: /proc/driver/nvidia
readOnly: true
resources:
claims:
- name: monitor-gpu
resourceClaims:
- name: monitor-gpu
source:
resourceClaimName: monitoring-gpu-claim
volumes:
- name: dev
hostPath:
path: /dev
- name: proc-driver-nvidia
hostPath:
path: /proc/driver/nvidia
多种型号的 GPU
如果集群 Node 中安装了多种型号的 GPU,则可以使用 Node Affinity 来调度 Pod 到指定 GPU 型号的 Node 上。
首先,在集群初始化时,需要给 Node 打上 GPU 型号的标签
# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
然后,在创建 Pod 时设置 Node Affinity:
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 etc.
使用 CUDA 库
NVIDIA Cuda Toolkit 和 cuDNN 等需要预先安装在所有 Node 上。为了访问 /usr/lib/nvidia-375
,需要将 CUDA 库以 hostPath volume 的形式传给容器:
apiVersion: batch/v1
kind: Job
metadata:
name: nvidia-smi
labels:
name: nvidia-smi
spec:
template:
metadata:
labels:
name: nvidia-smi
spec:
containers:
- name: nvidia-smi
image: nvidia/cuda
command: ["nvidia-smi"]
imagePullPolicy: IfNotPresent
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
volumeMounts:
- mountPath: /usr/local/nvidia/bin
name: bin
- mountPath: /usr/lib/nvidia
name: lib
volumes:
- name: bin
hostPath:
path: /usr/lib/nvidia-375/bin
- name: lib
hostPath:
path: /usr/lib/nvidia-375
restartPolicy: Never
$ kubectl create -f job.yaml
job "nvidia-smi" created
$ kubectl get job
NAME DESIRED SUCCESSFUL AGE
nvidia-smi 1 1 14m
$ kubectl get pod -a
NAME READY STATUS RESTARTS AGE
nvidia-smi-kwd2m 0/1 Completed 0 14m
$ kubectl logs nvidia-smi-kwd2m
Fri Jun 16 19:49:53 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:00:04.0 Off | 0 |
| N/A 74C P0 80W / 149W | 0MiB / 11439MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
附录:CUDA 安装方法
安装 CUDA:
# Check for CUDA and try to install.
if ! dpkg-query -W cuda; then
# The 16.04 installer works with 16.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
apt-get update
apt-get install cuda -y
fi
安装 cuDNN:
首先到网站 https://developer.nvidia.com/cudnn 注册,并下载 cuDNN v5.1,然后运行命令安装
tar zxvf cudnn-8.0-linux-x64-v5.1.tgz
ln -s /usr/local/cuda-8.0 /usr/local/cuda
sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
安装完成后,可以运行 nvidia-smi 查看 GPU 设备的状态
$ nvidia-smi
Fri Jun 16 19:33:35 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:00:04.0 Off | 0 |
| N/A 74C P0 80W / 149W | 0MiB / 11439MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
AI/ML 推理工作负载的网关管理
对于运行在 GPU 上的 AI/ML 推理服务,可以使用 Gateway API Inference Extension 来进行智能路由和负载平衡。
Gateway API Inference Extension 配置
# 定义 GPU 推理服务池
apiVersion: gateway.networking.x-k8s.io/v1alpha1
kind: InferencePool
metadata:
name: llama2-gpu-pool
spec:
deployment:
replicas: 3
template:
spec:
containers:
- name: vllm-server
image: vllm/vllm-openai:latest
resources:
limits:
nvidia.com/gpu: 1
memory: "16Gi"
cpu: "4"
requests:
memory: "8Gi"
cpu: "2"
env:
- name: MODEL_NAME
value: "meta-llama/Llama-2-7b-chat-hf"
- name: GPU_MEMORY_UTILIZATION
value: "0.9"
nodeSelector:
accelerator: nvidia-tesla-v100
---
# 定义模型端点
apiVersion: gateway.networking.x-k8s.io/v1alpha1
kind: InferenceModel
metadata:
name: llama2-7b-chat
spec:
poolRef:
name: llama2-gpu-pool
routing:
# 优先级路由:高优先级请求优先分配
priority: high
# 智能负载平衡:基于 GPU 利用率
loadBalancing:
strategy: gpu-aware
metrics:
- name: gpu_utilization
target: 80
- name: memory_utilization
target: 85
# 金丝雀发布
trafficSplit:
- weight: 90
version: stable
poolRef:
name: llama2-gpu-pool
- weight: 10
version: canary
poolRef:
name: llama2-gpu-pool-canary
---
# Gateway 配置
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ai-inference-gateway
spec:
gatewayClassName: inference-gateway-class
listeners:
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: inference-tls-cert
---
# HTTPRoute 将请求路由到推理模型
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llama2-inference-route
spec:
parentRefs:
- name: ai-inference-gateway
hostnames:
- "api.ai-platform.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/chat/completions
- headers:
- name: x-model-name
value: llama2-7b-chat
backendRefs:
- group: gateway.networking.x-k8s.io
kind: InferenceModel
name: llama2-7b-chat
filters:
# 基于请求优先级的路由
- type: ExtensionRef
extensionRef:
group: gateway.networking.x-k8s.io
kind: PriorityFilter
name: inference-priority
性能优势
使用 Gateway API Inference Extension 管理 GPU 推理工作负载具有以下优势:
智能路由:基于 GPU 利用率、内存使用情况等实时指标进行路由决策
降低延迟:特别是在高查询率下,延迟显著降低
提高 GPU 利用率:更有效的资源分配和负载平衡
支持模型版本管理:安全的金丝雀发布和 A/B 测试
请求优先级:重要请求可以获得优先处理
监控 GPU 推理服务
# GPU 推理服务监控 ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: inference-gpu-metrics
spec:
selector:
matchLabels:
app: inference-model
endpoints:
- port: metrics
interval: 30s
path: /metrics
参考文档
最后更新于