Node
最后更新于
最后更新于
A Node is the actual host where a Pod runs. This could be a physical or a virtual machine. For Pod management, each Node must at least run a container runtime (like docker
or rkt
), kubelet
, and the kube-proxy
service.
Unlike other resources such as Pods and Namespace, Kubernetes doesn't create a Node. It only manages resources on a Node. Although you can create a Node object using a Manifest (as shown in the yaml below), Kubernetes merely checks if such a Node indeed exists. If the check fails, Pod scheduling doesn't proceed.
The Node Controller conducts this check. The Node Controller is responsible for:
Maintaining Node status
Synchronizing Node with Cloud Provider
Assigning container CIDR to Node
Deleting Pods on the Node with NoExecute
taint
By default, kubelet registers itself with the master during startup and creates the Node resource.
Each Node includes the following status information:
Address: Including hostname, public IP, and private IP
Conditions: They include OutOfDisk, Ready, MemoryPressure, and DiskPressure
Capacity: Available resources on the Node, including CPU, memory, and the total number of Pods
Info: It includes kernel version, container engine version, OS type, etc.
Taints and Tolerations ensure Pods are not scheduled on inappropriate Nodes. Taint is applied to a Node, and toleration is applied to a Pod (Toleration is optional).
For example, you can use the taint command to add taints to node1:
For the specific usage of Taints and Tolerations, please refer to the scheduler section.
Marking a Node as unschedulable does not affect the Pods running on it. This feature is very useful when maintaining a Node:
When ShutdownGracePeriod
and ShutdownGracePeriodCriticalPods
are configured, Kubelet will detect Node shutdown status based on systemd events, and automatically terminate the running Pods on it (ShutdownGracePeriodCriticalPods needs to be less than ShutdownGracePeriod). Note, both parameters are configured as 0 by default, which means the graceful shutdown feature is off by default.
For example, if ShutdownGracePeriod is set to 30s, and ShutdownGracePeriodCriticalPods is set to 10s, the Kubelet delays the Node shutdown by 30 seconds. During the shutdown, the first 20 (30-10) seconds are reserved to terminate regular Pods, while the last 10 seconds are saved to terminate critical Pods.
In circumstances where the Node experiences an anomaly, Kubelet may not have a chance to detect and perform a graceful shutdown. In such scenarios, StatefulSet cannot create a new Pod with the same name, if the Pod uses a volume, then VolumeAttachments will not be deleted from the original shutdown Node, hence these Pods' volumes are unable to be mounted on new running Nodes.
Forced Node shutdown is specially designed to solve these problems. Users can manually add node.kubernetes.io/out-of-service
taint to a Node with NoExecute
or NoSchedule
effect, marking it as incapable of providing services. If the NodeOutOfServiceVolumeDetach
feature is enabled on kube-controller-manager, and respective toleration is not set on Pods, these Pods will be forcibly deleted and volume detachment operation will immediately proceed for terminated Pods on that Node. As a result, Pods that were on incapable Nodes can quickly recover on other Nodes.