Overview
The art of untangling issues in Kubernetes clusters and applications primarily involves
With kube-copilot powered by OpenAI, you can automatically diagnose the tribulations encumbering your cluster and interact with the said cluster in natural language.
In the mission of untangling errors, kubectl
assumes the role of the primary tool, usually serving as the starting point towards identifying mistakes. Following are commands of frequent necessity that are integral to error troubleshooting processes.
Checking Pod status and running nodes
Inspect Pod events
Surveying Node status
kube-apiserver logs
The above commands presuppose the control plane functioning in the form of Kubernetes static Pod. If kube-apiserver is governed by systemd, you will have to log into the master node, then use journalctl -u kube-apiserver to review its log.
kube-controller-manager logs
Similar to the above, these operations also assume that the control plane is operating in the form of Kubernetes static pod. If the kube-controller-manager is managed by systemd, you will need to log in to the master node, then use journalctl -u kube-controller-manager to review its log.
kube-scheduler logs
As seen earlier, these operations assume that the control plane is functioning as Kubernetes static Pod. If kube-scheduler is managed by systemd, log into the master node, then use journalctl -u kube-scheduler to access its log.
kube-dns logs
kube-dns is usually deployed as an Addon, with each Pod encompassing three containers. The most critical log is from the kubedns container:
Kubelet logs
Kubelet is typically managed by systemd. To look at Kubelet logs, begin by SSHing into the Node. It's suggested to use the kubectl-node-shell plugin instead of allocating a public IP address for each node. For instance:
Kube-proxy logs
Kube-proxy is usually deployed as a DaemonSet, its logs can directly be queried with kubectl
Further Reading
The hjacobs/kubernetes-failure-stories collates a montage of public Kubernetes anomaly cases.
https://docs.microsoft.com/en-us/azure/aks/troubleshooting shares general insights into troubleshooting AKS.
https://cloud.google.com/kubernetes-engine/docs/troubleshooting narrates general strategies for troubleshooting question within GKE.
最后更新于