资源不足,集群内所有的 Node 都不满足该 Pod 请求的 CPU、内存、GPU 或者临时存储空间等资源。解决方法是删除集群内不用的 Pod 或者增加新的 Node。
HostPort 端口已被占用,通常推荐使用 Service 对外开放服务端口
Pod 一直处于 Waiting 或 ContainerCreating 状态
首先还是通过 kubectl describe pod <pod-name> 命令查看到当前 Pod 的事件
$kubectl-nkube-systemdescribepodnginx-podEvents:TypeReasonAgeFromMessage-------------------------NormalScheduled1mdefault-schedulerSuccessfullyassignednginx-podtonode1 Normal SuccessfulMountVolume 1m kubelet, gpu13 MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 1m kubelet, gpu13 MountVolume.SetUp succeeded for volume "coredns-token-sxdmc"
WarningFailedSync2s (x4 over46s) kubelet, gpu13 Error syncing pod Normal SandboxChanged 1s (x4 over 46s) kubelet, gpu13 Pod sandbox changed, it will be killed and re-created.
可以发现,该 Pod 的 Sandbox 容器无法正常启动,具体原因需要查看 Kubelet 日志:
$journalctl-ukubelet...Mar 14 04:22:04 node1 kubelet[29801]: E0314 04:22:04.649912 29801 cni.go:294] Error adding network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
Mar 14 04:22:04 node1 kubelet[29801]: E0314 04:22:04.649941 29801 cni.go:243] Error while adding to cni network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
Mar 14 04:22:04 node1 kubelet[29801]: W0314 04:22:04.891337 29801 cni.go:258] CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "c4fd616cde0e7052c240173541b8543f746e75c17744872aa04fe06f52b5141c"
Mar 14 04:22:05 node1 kubelet[29801]: E0314 04:22:05.965801 29801 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "nginx-pod" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
发现是 cni0 网桥配置了一个不同网段的 IP 地址导致,删除该网桥(网络插件会自动重新创建)即可修复
$kubectldescribepodmypod...Events:TypeReasonAgeFromMessage------------------------- Normal Scheduled 36s default-scheduler Successfully assigned sh to k8s-agentpool1-38622806-0
Normal SuccessfulMountVolume 35s kubelet, k8s-agentpool1-38622806-0 MountVolume.SetUp succeeded for volume "default-token-n4pn6"
NormalPulling17s (x2 over33s) kubelet, k8s-agentpool1-38622806-0 pulling image "a1pine" Warning Failed 14s (x2 over 29s) kubelet, k8s-agentpool1-38622806-0 Failed to pull image "a1pine": rpc error: code = Unknown desc = Error response from daemon: repository a1pine not found: does not exist or no pull access
WarningFailed14s (x2 over29s) kubelet, k8s-agentpool1-38622806-0 Error: ErrImagePull Normal SandboxChanged 4s (x7 over 28s) kubelet, k8s-agentpool1-38622806-0 Pod sandbox changed, it will be killed and re-created.
Normal BackOff 4s (x5 over 25s) kubelet, k8s-agentpool1-38622806-0 Back-off pulling image "a1pine"
WarningFailed1s (x6 over25s) kubelet, k8s-agentpool1-38622806-0 Error: ImagePullBackOff