StatefulSet
StatefulSet 是为了解决有状态服务的问题(对应 Deployments 和 ReplicaSets 是为无状态服务而设计),其应用场景包括
  • 稳定的持久化存储,即 Pod 重新调度后还是能访问到相同的持久化数据,基于 PVC 来实现
  • 稳定的网络标志,即 Pod 重新调度后其 PodName 和 HostName 不变,基于 Headless Service(即没有 Cluster IP 的 Service)来实现
  • 有序部署,有序扩展,即 Pod 是有顺序的,在部署或者扩展的时候要依据定义的顺序依次依序进行(即从 0 到 N-1,在下一个 Pod 运行之前所有之前的 Pod 必须都是 Running 和 Ready 状态),基于 init containers 来实现
  • 有序收缩,有序删除(即从 N-1 到 0)
从上面的应用场景可以发现,StatefulSet 由以下几个部分组成:
  • 用于定义网络标志(DNS domain)的 Headless Service
  • 用于创建 PersistentVolumes 的 volumeClaimTemplates
  • 定义具体应用的 StatefulSet
StatefulSet 中每个 Pod 的 DNS 格式为 statefulSetName-{0..N-1}.serviceName.namespace.svc.cluster.local,其中
  • serviceName 为 Headless Service 的名字
  • 0..N-1 为 Pod 所在的序号,从 0 开始到 N-1
  • statefulSetName 为 StatefulSet 的名字
  • namespace 为服务所在的 namespace,Headless Service 和 StatefulSet 必须在相同的 namespace
  • .cluster.local 为 Cluster Domain

Kubernetes 版本
Deployment 版本
v1.5-v1.6
extensions/v1beta1
v1.7-v1.15
apps/v1beta1
v1.8-v1.15
apps/v1beta2
v1.9+
apps/v1

以一个简单的 nginx 服务 web.yaml 为例:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
$ kubectl create -f web.yaml
service "nginx" created
statefulset "web" created
# 查看创建的 headless service 和 statefulset
$ kubectl get service nginx
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx None <none> 80/TCP 1m
$ kubectl get statefulset web
NAME DESIRED CURRENT AGE
web 2 2 2m
# 根据 volumeClaimTemplates 自动创建 PVC(在 GCE 中会自动创建 kubernetes.io/gce-pd 类型的 volume)
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
www-web-0 Bound pvc-d064a004-d8d4-11e6-b521-42010a800002 1Gi RWO 16s
www-web-1 Bound pvc-d06a3946-d8d4-11e6-b521-42010a800002 1Gi RWO 16s
# 查看创建的 Pod,他们都是有序的
$ kubectl get pods -l app=nginx
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 5m
web-1 1/1 Running 0 4m
# 使用 nslookup 查看这些 Pod 的 DNS
$ kubectl run -i --tty --image busybox dns-test --restart=Never --rm /bin/sh
/ # nslookup web-0.nginx
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx
Address 1: 10.244.2.10
/ # nslookup web-1.nginx
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-1.nginx
Address 1: 10.244.3.12
/ # nslookup web-0.nginx.default.svc.cluster.local
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx.default.svc.cluster.local
Address 1: 10.244.2.10
还可以进行其他的操作
# 扩容
$ kubectl scale statefulset web --replicas=5
# 缩容
$ kubectl patch statefulset web -p '{"spec":{"replicas":3}}'
# 镜像更新(目前还不支持直接更新 image,需要 patch 来间接实现)
$ kubectl patch statefulset web --type='json' -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"gcr.io/google_containers/nginx-slim:0.7"}]'
# 删除 StatefulSet 和 Headless Service
$ kubectl delete statefulset web
$ kubectl delete service nginx
# StatefulSet 删除后 PVC 还会保留着,数据不再使用的话也需要删除
$ kubectl delete pvc www-web-0 www-web-1

v1.7 + 支持 StatefulSet 的自动更新,通过 spec.updateStrategy 设置更新策略。目前支持两种策略
  • OnDelete:当 .spec.template 更新时,并不立即删除旧的 Pod,而是等待用户手动删除这些旧 Pod 后自动创建新 Pod。这是默认的更新策略,兼容 v1.6 版本的行为
  • RollingUpdate:当 .spec.template 更新时,自动删除旧的 Pod 并创建新 Pod 替换。在更新时,这些 Pod 是按逆序的方式进行,依次删除、创建并等待 Pod 变成 Ready 状态才进行下一个 Pod 的更新。

RollingUpdate 还支持 Partitions,通过 .spec.updateStrategy.rollingUpdate.partition 来设置。当 partition 设置后,只有序号大于或等于 partition 的 Pod 会在 .spec.template 更新的时候滚动更新,而其余的 Pod 则保持不变(即便是删除后也是用以前的版本重新创建)。
# 设置 partition 为 3
$ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":3}}}}'
statefulset "web" patched
# 更新 StatefulSet
$ kubectl patch statefulset web --type='json' -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"gcr.io/google_containers/nginx-slim:0.7"}]'
statefulset "web" patched
# 验证更新
$ kubectl delete po web-2
pod "web-2" deleted
$ kubectl get po -lapp=nginx -w
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 4m
web-1 1/1 Running 0 4m
web-2 0/1 ContainerCreating 0 11s
web-2 1/1 Running 0 18s

v1.7 + 可以通过 .spec.podManagementPolicy 设置 Pod 管理策略,支持两种方式
  • OrderedReady:默认的策略,按照 Pod 的次序依次创建每个 Pod 并等待 Ready 之后才创建后面的 Pod
  • Parallel:并行创建或删除 Pod(不等待前面的 Pod Ready 就开始创建所有的 Pod)

---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
podManagementPolicy: "Parallel"
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
可以看到,所有 Pod 是并行创建的
$ kubectl create -f webp.yaml
service "nginx" created
statefulset "web" created
$ kubectl get po -lapp=nginx -w
NAME READY STATUS RESTARTS AGE
web-0 0/1 Pending 0 0s
web-0 0/1 Pending 0 0s
web-1 0/1 Pending 0 0s
web-1 0/1 Pending 0 0s
web-0 0/1 ContainerCreating 0 0s
web-1 0/1 ContainerCreating 0 0s
web-0 1/1 Running 0 10s
web-1 1/1 Running 0 10s

另外一个更能说明 StatefulSet 强大功能的示例为 zookeeper.yaml
---
apiVersion: v1
kind: Service
metadata:
name: zk-headless
labels:
app: zk-headless
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: ConfigMap
metadata:
name: zk-config
data:
ensemble: "zk-0;zk-1;zk-2"
jvm.heap: "2G"
tick: "2000"
init: "10"
sync: "5"
client.cnxns: "60"
snap.retain: "3"
purge.interval: "1"
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-budget
spec:
selector:
matchLabels:
app: zk
minAvailable: 2
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: zk
spec:
serviceName: zk-headless
replicas: 3
template:
metadata:
labels:
app: zk
annotations:
pod.alpha.kubernetes.io/initialized: "true"
scheduler.alpha.kubernetes.io/affinity: >
{
"podAntiAffinity": {
"requiredDuringSchedulingRequiredDuringExecution": [{
"labelSelector": {
"matchExpressions": [{
"key": "app",
"operator": "In",
"values": ["zk-headless"]
}]
},
"topologyKey": "kubernetes.io/hostname"
}]
}
}
spec:
containers:
- name: k8szk
imagePullPolicy: Always
image: gcr.io/google_samples/k8szk:v1
resources:
requests:
memory: "4Gi"
cpu: "1"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
env:
- name : ZK_ENSEMBLE
valueFrom:
configMapKeyRef:
name: zk-config
key: ensemble
- name : ZK_HEAP_SIZE
valueFrom:
configMapKeyRef:
name: zk-config
key: jvm.heap
- name : ZK_TICK_TIME
valueFrom:
configMapKeyRef:
name: zk-config
key: tick
- name : ZK_INIT_LIMIT
valueFrom:
configMapKeyRef:
name: zk-config
key: init
- name : ZK_SYNC_LIMIT
valueFrom:
configMapKeyRef:
name: zk-config
key: tick
- name : ZK_MAX_CLIENT_CNXNS
valueFrom:
configMapKeyRef:
name: zk-config
key: client.cnxns
- name: ZK_SNAP_RETAIN_COUNT
valueFrom:
configMapKeyRef:
name: zk-config
key: snap.retain
- name: ZK_PURGE_INTERVAL
valueFrom:
configMapKeyRef:
name: zk-config
key: purge.interval
- name: ZK_CLIENT_PORT
value: "2181"
- name: ZK_SERVER_PORT
value: "2888"
- name: ZK_ELECTION_PORT
value: "3888"
command:
- sh
- -c
- zkGenConfig.sh && zkServer.sh start-foreground
readinessProbe:
exec:
command:
- "zkOk.sh"
initialDelaySeconds: 15
timeoutSeconds: 5
livenessProbe:
exec:
command:
- "zkOk.sh"
initialDelaySeconds: 15
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
kubectl create -f zookeeper.yaml
详细的使用说明见 zookeeper stateful application

  1. 1.
    推荐在 Kubernetes v1.9 或以后的版本中使用
  2. 2.
    所有 Pod 的 Volume 必须使用 PersistentVolume 或者是管理员事先创建好
  3. 3.
    为了保证数据安全,删除 StatefulSet 时不会删除 Volume
  4. 4.
    StatefulSet 需要一个 Headless Service 来定义 DNS domain,需要在 StatefulSet 之前创建好
复制链接
大纲
API 版本对照表
简单示例
更新 StatefulSet
Partitions
Pod 管理策略
Parallel 示例
zookeeper
StatefulSet 注意事项