Kubernetes指南
Linux性能优化实战eBPF 核心技术与实战SDN指南个人博客
EN
EN
  • Overview
  • Introduction
    • Kubernetes Introduction
    • Kubernetes Concepts
    • Kubernetes 101
    • Kubernetes 201
    • Kubernetes Cluster
  • Concepts
    • Concepts
    • Architecture
    • Design Principles
    • Components
      • etcd
      • kube-apiserver
      • kube-scheduler
      • kube-controller-manager
      • kubelet
      • kube-proxy
      • kube-dns
      • Federation
      • kubeadm
      • hyperkube
      • kubectl
    • Objects
      • Autoscaling
      • ConfigMap
      • CronJob
      • CustomResourceDefinition
      • DaemonSet
      • Deployment
      • Ingress
      • Job
      • LocalVolume
      • Namespace
      • NetworkPolicy
      • Node
      • PersistentVolume
      • Pod
      • PodPreset
      • ReplicaSet
      • Resource Quota
      • Secret
      • SecurityContext
      • Service
      • ServiceAccount
      • StatefulSet
      • Volume
  • Setup
    • Setup Guidance
    • kubectl Install
    • Single Machine
    • Feature Gates
    • Best Practice
    • Version Support
    • Setup Cluster
      • kubeadm
      • kops
      • Kubespray
      • Azure
      • Windows
      • LinuxKit
      • kubeasz
    • Setup Addons
      • Addon-manager
      • DNS
      • Dashboard
      • Monitoring
      • Logging
      • Metrics
      • GPU
      • Cluster Autoscaler
      • ip-masq-agent
  • Extension
    • API Extension
      • Aggregation
      • CustomResourceDefinition
    • Access Control
      • Authentication
      • RBAC Authz
      • Admission
    • Scheduler Extension
    • Network Plugin
      • CNI
      • Flannel
      • Calico
      • Weave
      • Cilium
      • OVN
      • Contiv
      • SR-IOV
      • Romana
      • OpenContrail
      • Kuryr
    • Container Runtime
      • CRI-tools
      • Frakti
    • Storage Driver
      • CSI
      • FlexVolume
      • glusterfs
    • Network Policy
    • Ingress Controller
      • Ingress + Letsencrypt
      • minikube Ingress
      • Traefik Ingress
      • Keepalived-VIP
    • Cloud Provider
    • Device Plugin
  • Cloud Native Apps
    • Apps Management
      • Patterns
      • Rolling Update
      • Helm
      • Operator
      • Service Mesh
      • Linkerd
      • Linkerd2
    • Istio
      • Deploy
      • Traffic Management
      • Security
      • Policy
      • Metrics
      • Troubleshooting
      • Community
    • Devops
      • Draft
      • Jenkins X
      • Spinnaker
      • Kompose
      • Skaffold
      • Argo
      • Flux GitOps
  • Practices
    • Overview
    • Resource Management
    • Cluster HA
    • Workload HA
    • Debugging
    • Portmap
    • Portforward
    • User Management
    • GPU
    • HugePage
    • Security
    • Audit
    • Backup
    • Cert Rotation
    • Large Cluster
    • Big Data
      • Spark
      • Tensorflow
    • Serverless
  • Troubleshooting
    • Overview
    • Cluster Troubleshooting
    • Pod Troubleshooting
    • Network Troubleshooting
    • PV Troubleshooting
      • AzureDisk
      • AzureFile
    • Windows Troubleshooting
    • Cloud Platform Troubleshooting
      • Azure
    • Troubleshooting Tools
  • Community
    • Development Guide
    • Unit Test and Integration Test
    • Community Contribution
  • Appendix
    • Ecosystem
    • Learning Resources
    • Domestic Mirrors
    • How to Contribute
    • Reference Documents
由 GitBook 提供支持
在本页
  • Essential Tools
  • kubectl-node-shell
  • sysdig
  • Installation
  • Examples
  • Weave Scope
  • Installation
  • Viewing the UI
  • Known Issues
  • References
  1. Troubleshooting

Troubleshooting Tools

上一页Azure下一页Development Guide

最后更新于1年前

The chapter mainly introduces the tools frequently used in troubleshooting in Kubernetes.

Essential Tools

  • kubectl: This is used to inspect the status of both Kubernetes clusters and containers, such as kubectl describe pod <pod-name>.

  • journalctl: This tool is used to peruse logs of Kubernetes components, using commands like journalctl -u kubelet -l.

  • iptables and ebtables: These are used to troubleshoot whether a Service is working, such as with iptables -t nat -nL, which checks if the iptables rules configured by kube-proxy are working properly.

  • tcpdump: This is used to troubleshoot issues pertaining to container networks, using commands like tcpdump -nn host 10.240.0.8.

  • perf: A performance analysis tool that comes with the Linux kernel, this is often used to troubleshoot performance issues, such as the issue mentioned in .

kubectl-node-shell

To check the logs of system components like Kubelet, CNI, kernel, and so on, you need to first SSH into the Node. It is recommended to use the plugin instead of assigning a public IP address to every node.

curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
chmod +x ./kubectl-node_shell
sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell

kubectl node-shell <node>
journalctl -l -u kubelet

sysdig

sysdig is a troubleshooting tool for containers and comes in both open-source and commercial editions. For regular troubleshooting, the open-source version will suffice.

Aside from sysdig, two other auxiliary tools can be used:

  • csysdig: This is automatically installed with sysdig and offers a Command Line Interface (CLI).

Installation

# On Ubuntu
curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public | apt-key add -
curl -s -o /etc/apt/sources.list.d/draios.list http://download.draios.com/stable/deb/draios.list
apt-get update
apt-get -y install linux-headers-$(uname -r)
apt-get -y install sysdig

# On REHL
rpm --import https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public
curl -s -o /etc/yum.repos.d/draios.repo http://download.draios.com/stable/rpm/draios.repo
rpm -i http://mirror.us.leaseweb.net/epel/6/i386/epel-release-6-8.noarch.rpm
yum -y install kernel-devel-$(uname -r)
yum -y install sysdig

# On MacOS
brew install sysdig

Examples

# Refer to https://www.sysdig.org/wiki/sysdig-examples/.
# View the top network connections
sudo sysdig -pc -c topconns
# View the top network connections within the wordpress1 container
sudo sysdig -pc -c topconns container.name=wordpress1

# Show the network data exchanged with the host 192.168.0.1
sudo sysdig fd.ip=192.168.0.1
sudo sysdig -s2000 -A -c echo_fds fd.cip=192.168.0.1

# List all incoming connections that are not served by Apache.
sudo sysdig -p"%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"

# View the CPU/Network/IO usage of processes running within a container.
sudo sysdig -pc -c topprocs_cpu container.id=2e854c4525b8
sudo sysdig -pc -c topprocs_net container.id=2e854c4525b8
sudo sysdig -pc -c topfiles_bytes container.id=2e854c4525b8

# See the files where Apache spends most of its I/O time
sudo sysdig -c topfiles_time proc.name=httpd

# Show all interactive commands executed within a certain container.
sudo sysdig -pc -c spy_users 

# Show every time a file is opened under /etc.
sudo sysdig evt.type=open and fd.name

# View the list of processes with container context
sudo csysdig -pc

Weave Scope

Weave Scope is another container monitoring and troubleshooting tool that offers visualization. It does not come with the powerful CLI that sysdig offers, but it does have a simple-to-use interactive interface. It automatically outlines the topology of the entire cluster and its functionality can be expanded using plugins. According to its official site, the features provided by Weave Scope include:

  • The Probe collects information about the containers and hosts and sends it to the App.

  • The App processes this information, generates reports accordingly and presents them in the form of an interactive UI.

Installation

kubectl apply -f "https://cloud.weave.works/k8s/scope.yaml?k8s-version=$(kubectl version | base64 | tr -d '\n')&k8s-service-type=LoadBalancer"

Viewing the UI

After installation is complete, you can use weave-scope-app to view the interactive UI:

kubectl -n weave get service weave-scope-app
kubectl -n weave port-forward service/weave-scope-app :80

Clicking on a Pod will permit you to see real-time statuses and metrics data for all the containers in the Pod:

Known Issues

[ 263.736006] CPU: 0 PID: 6309 Comm: scope Not tainted 4.4.0-119-generic #143-Ubuntu
[ 263.736006] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[...]

There are two solutions for this problem:

  • Disable eBPF detection with --probe.ebpf.connections=false.

  • Upgrade the kernel, for example, to 4.13.0.

References

: This provides a graphical interface (non-real time) for trace files saved by sysdig, such as with sudo sysdig -w filename.scap.

For more samples and usage methods, check out the .

Weave Scope is made up of two parts - the - which carry out different tasks:

When activating --probe.ebpf.connections on Ubuntu kernel 4.4.0 (it is activated by default), the Node might :

Container Isolation Gone Wrong
kubectl-node-shell
sysdig-inspect
Sysdig User Guide
Interactive topology interface
Graphical mode and table mode
Filtering feature
Search feature
Real-time metrics
Container troubleshooting
Custom plugins
App and the Probe
repeatedly restart due to kernel issues
Overview of kubectl
Monitoring Kuberietes with sysdig