K8s-Pod健康检查原理与实践-LMLPHP

Pod健康检查介绍

默认情况下,kubelet根据容器运行状态作为健康依据,不能监视容器中应用程序状态,例如程序假死。这将会导致无法提供服务,丢失流量。因此重新健康检查机制确保容器健康幸存。Pod通过两类探针来检查容器的健康状态。分别是LivenessProbe(存活探测)和  ReadinessProbe(及时探测)。

livenessProbe(存活探测)

存活探测将通过http,shell命令或tcp等方式去检测容器中的应用是否健康,然后将检查结果返回给kubelet,如果检查容器中应用为不健康状态提交给kubelet后,kubelet将根据Pod配置清单中定义的重启策略restartPolicy来对Pod进行重启。

readyinessProbe(准备探测)

易于探测也是通过http,shell命令或者tcp等方式去检测容器中的应用是否健康或则是否能够正常对外提供服务,如果能够正常对外提供服务,则认为该容器为(Ready状态),达到(Ready状态)的Pod才可以接收请求。

对于被服务所管理的Pod,服务与被管理Pod的关联关系也将基于Pod是否就绪进行设置,Pod对象启动后,容器应用通常需要能够完成其初始化的过程,例如加载配置或数据,甚至有些程序需要运行某类的预热过程,如果在此阶段完成之前就已经接收客户端的请求,那么客户端返回时间肯定非常慢,严重影响了体验,所以因为避免Pod对象启动后立即让其处理客户端请求,另外等待容器初始化工作执行完成并转为Ready状态后再接收客户端请求。

如果容器或则Pod状态为(NoReady)状态,Kubernetes将把该Pod从服务的插入端点Pod中去剔除。

健康检测实现方式

以上介绍了两种探测类型livenessProbe(存活探测),readinessProbe(及时探测),这两种探测都支持以下方式对容器进行健康检查

1. ExecAction:在容器中执行命令,命令执行后返回的状态为0则成功,表示我们探测结果正常

2. HTTPGetAction:根据容器IP,端口以及路径发送HTTP请求,返回码如果是200-400之间表示成功

3. TCPSocket动作:根据容器IP地址及特定的端口进行TCP检查,端口开放表示成功

以上各个检查动作都可能有以下三种返回状态

1.成功,表示通过了健康检查

2.失败,表示没有通过健康检查

3.未知,表示检查动作失败

livenessProbe存活探测示范

livenessExecActiion示例

通过在目标容器中执行由用户自定义的命令来判定容器的健康状态,即在容器内部执行一个命令,如果改命令的返回码为0,则表明容器健康。spec.containers.LivenessProbe字段用于定义此类检测,它只有一个可用属性命令,用于指定要执行的命令,下面是在资源清单文件中使用liveness-exec方式的示例:

1.创建资源配置清单

创建一个Pod——》运行Nginx容器-》首先启动nginx——》然后沉睡60秒后-〉删除nginx.pid通过livenessProbe恢复探测的执行命令判断nginx.pid文件是否存在,如果探测返回结果非0,则按照重启策略进行重启。预期是容器真正(Ready)状态60s后,删除nginx.pid,exec命令检测生效,按照重启策略进行重启

cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep ; rm -rf /run/nginx.pid
livenessProbe:
exec:
command: [ "/bin/sh", "-c", "test", "-e", "/run/nginx.pid" ]
restartPolicy: Always

2.创建Pod资源

kubectl apply -f ngx-health.yaml

等待Pod准备就绪

3.查看Pod的详细信息

#第一次查看,Pod中的容器启动成功,事件正常
kubectl describe pods/ngx-health | grep -A Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulling 12s kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 6s kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 6s kubelet, k8s-node03 Created container ngx-liveness
Normal Started 5s kubelet, k8s-node03 Started container ngx-liveness #第二次查看,容器的livenessProbe探测失败,
kubectl describe pods/ngx-health | grep -A Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulling 52s kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 46s kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 46s kubelet, k8s-node03 Created container ngx-liveness
Normal Started 45s kubelet, k8s-node03 Started container ngx-liveness
Warning Unhealthy 20s (x3 over 40s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 20s kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted #第三次查看,已经重新拉取镜像,然后创建容器再启动容器
kubectl describe pods/ngx-health | grep -A Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Warning Unhealthy 35s (x3 over 55s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 35s kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
Normal Pulling 4s (x2 over 67s) kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 2s (x2 over 61s) kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 2s (x2 over 61s) kubelet, k8s-node03 Created container ngx-liveness
Normal Started 2s (x2 over 60s) kubelet, k8s-node03 Started container ngx-liveness

通过长格式输出可以看到如下,第一次长格式输出Pod运行时间22s,重启次数为0第二次长格式输出,运行时间是76s,Pod已经完成一次重启

kubectl get pods -o wide | grep ngx-health
ngx-health / Running 22s 10.244.5.44 k8s-node03 <none> <none> kubectl get pods -o wide | grep ngx-health
ngx-health / Running 76s 10.244.5.44 k8s-node03 <none> <none>

第二次健康探测失败及第二次重启

kubectl describe pods/ngx-health | grep -A  Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulled 58s (x2 over 117s) kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 58s (x2 over 117s) kubelet, k8s-node03 Created container ngx-liveness
Normal Started 58s (x2 over 116s) kubelet, k8s-node03 Started container ngx-liveness
Warning Unhealthy 31s (x6 over 111s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 31s (x2 over 91s) kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
Normal Pulling 0s (x3 over 2m3s) kubelet, k8s-node03 Pulling image "nginx:latest" kubectl get pods -o wide | grep ngx-health
ngx-health / Running 2m13s 10.244.5.44 k8s-node03 <none> <none>

用于HTTPGetAction示例的livenessProbe

通过容器的ip地址,端口号和路径调用HTTPGet方法,如果响应的状态码大于等于200并且小于400,则认为容器健康,spec.containers.livenessProbe.httpGet则用于定义定义的检测,它的可用配置包括以下几个:

• host:请求的主机地址,默认为Pod IP;也可以在httpHeaders中使用Host:来定义

• port:请求的端口,必选区段,端口范围1-65535

• httpHeaders <[] Object>:自定义的请求报文首部

• path:请求的HTTP资源路径,即URL路径

• scheme:建立连接使用的协议,仅可为HTTP或HTTPS,默认为HTTP

1.创建资源配置清单

创建一个Pod——》运行Nginx容器-》首先启动nginx——》然后沉睡60秒后-〉删除nginx.pid通过livenessProbe存活探测的httpGet方法请求nginx项目根目录下的index.html文件,访问进入为80,访问地址替换为Pod IP,请求协议为HTTP,如果请求失败则按照重启策略进行重启。

cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep ; rm -rf /run/nginx.pid
livenessProbe:
httpGet:
path: /index.html
port:
scheme: HTTP
restartPolicy: Always

2.创建Pod资源对象

kubectl apply -f ngx-health.yaml

3.查看Pod运行状态

#容器创建
kubectl get pods -o wide | grep ngx-health
ngx-health / ContainerCreating 7s <none> k8s-node02 <none> <none> #容器运行成功
kubectl get pods -o wide | grep ngx-health
ngx-health / Running 19s 10.244.2.36 k8s-node02 <none> <none>

4.查看Pod的详细事件信息

容器模块化拉取并启动成功

kubectl describe pods/ngx-health | grep -A  Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 30s kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 15s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 15s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 14s kubelet, k8s-node02 Started container ngx-liveness

容器就绪状态后运行60s左右livenessProbe健康检测,可以看到下面已经又开始拉取适当

kubectl describe pods/ngx-health | grep -A  Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulled 63s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 63s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 62s kubelet, k8s-node02 Started container ngx-liveness
Normal Pulling 1s (x2 over 78s) kubelet, k8s-node02 Pulling image "nginx:latest"

纵览取取后后再次重启创建并启动一遍,可以看到Age列的时间已经重新计算

kubectl describe pods/ngx-health | grep -A  Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 18s (x2 over 95s) kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 2s (x2 over 80s) kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 2s (x2 over 80s) kubelet, k8s-node02 Created container ngx-liveness
Normal Started 1s (x2 over 79s) kubelet, k8s-node02 Started container ngx-liveness

长格式输出Pod,可以看到Pod已经重启过一次一次

kubectl get pods -o wide | grep ngx-health
ngx-health / Completed 96s 10.244.2.36 k8s-node02 <none> <none>
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep ngx-health
ngx-health / Running 104s 10.244.2.36 k8s-node02 <none> <none>

通过查看容器日志,可以看到下面的探测日志,至少10秒探测一次

kubectl logs -f pods/ngx-health
10.244.2.1 - - [/May/::: +] "GET /index.html HTTP/1.1" "-" "kube-probe/1.18" "-"
10.244.2.1 - - [/May/::: +] "GET /index.html HTTP/1.1" "-" "kube-probe/1.18" "-"
10.244.2.1 - - [/May/::: +] "GET /index.html HTTP/1.1" "-" "kube-probe/1.18" "-"
10.244.2.1 - - [/May/::: +] "GET /index.html HTTP/1.1" "-" "kube-probe/1.18" "-"
10.244.2.1 - - [/May/::: +] "GET /index.html HTTP/1.1" "-" "kube-probe/1.18" "-"
10.244.2.1 - - [/May/::: +] "GET /index.html HTTP/1.1" "-" "kube-probe/1.18" "-"

livenessProbe for TCPSocketAction示例

通过容器的IP地址和端口号进行TCP检查,如果能够建立TCP连接,则表明容器健康。相比较来说,它比基于HTTP的探测要更高效,更节约资源,但精准度略低,之后建立连接成功未必意味着页面资源可用,spec.containers.livenessProbe.tcpSocket分区用于定义的类别检测,它主要包含以下两个可用的属性:

• host:请求连接的目标IP地址,默认为Pod IP

•端口:请求连接的目标端口,必选分段下面是在资源清单文件中使用liveness-tcp方式的示例,它向Pod IP的80 / tcp端口发起连接请求,并根据连接建立的状态决定测试结果:

1.创建资源配置清单

apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep ; rm -rf /run/nginx.pid
livenessProbe:
tcpSocket:
port:
restartPolicy: Always

2.创建资源对象

kubectl apply -f ngx-health.yaml

3.查看Pod创建属性信息

#容器创建并启动成功
kubectl describe pods/ngx-health | grep -A Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 19s kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 9s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 8s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 8s kubelet, k8s-node02 Started container ngx-liveness #在容器ready状态后60s左右Pod已经有了再次拉取镜像的动作
kubectl describe pods/ngx-health | grep -A Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulled 72s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 71s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 71s kubelet, k8s-node02 Started container ngx-liveness
Normal Pulling 10s (x2 over 82s) kubelet, k8s-node02 Pulling image "nginx:latest" #通过长格式输出Pod,也可以看到当前Pod已经进入了完成的状态,接下来就是重启Pod
kubectl get pods -o wide | grep ngx-health
ngx-health / Completed 90s 10.244.2.37 k8s-node02 <none> <none>

健康检测参数

上面介绍了两种在不同时间段的探测方式,以及两种探测方式所支持的探测方法,这里介绍了几个辅助参数

• initialDelaySeconds:检查开始执行的时间,以容器启动完成为起点计算

• periodSeconds:检查执行的周期,交替为10秒,最小为1秒

• successThreshold:从上次检查失败后重新确认检查成功的检查次数阈值(必须是连续成功),要么为1,也必须是1

• timeoutSeconds:检查超时的时间,至少为1秒,最小为1秒

• failureThreshold:从上次检查成功后发现检查失败的检查次数阈值(必须是连续失败),交替为1

健康检测实践

以下示例使用了随时探测readinessProbe和存活探测livenessProbe

完善探测配置解析:

1.容器在启动5秒initialDelaySeconds后进行首次就绪检测,将通过http访问检测容器网站根目录下的index.html文件,如果检测成功,则Pod将被标记为(就绪)状态。

2.然后立即检测通过periodSeconds参数所指定的间隔时间进行循环检测,以下我所指定的间隔时间是10秒钟,每隔10秒钟重新检测一次。

3.每次探测超时时间为3秒,如果探测失败1次就重置Pod从Service的插入Pod中剔除,剔除后客户端请求将无法通过Service访问到其Pod。

4.立即进行探测继续继续进行探测,那么如果发现此Pod探测成功1次,通过successThreshold参数设置的值,那么直接将其重新引入Pod。

存活探测配置解析

1.容器在启动15秒initialDelaySeconds后进行首次存活检测,将通过tcpSocket检测容器的80端口,如果检测返回变量0则成功。

2.每次存活探测间隔为3秒,每次探测超时时间为1秒,如果连续探测失败2次则通过重启策略重启Pod。

3.检测失败后的Pod,幸存探测继续进行探测,如果再探测成功一次,那么将认为此Pod为健康状态

1.资源配置清单

cat nginx-health.yaml
#create namespace
apiVersion: v1
kind: Namespace
metadata:
name: nginx-health-ns
labels:
resource: nginx-ns
spec: --- #create deploy and pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-health-deploy
namespace: nginx-health-ns
labels:
resource: nginx-deploy
spec:
replicas:
revisionHistoryLimit:
selector:
matchLabels:
app: nginx-health
template:
metadata:
namespace: nginx-health-ns
labels:
app: nginx-health
spec:
restartPolicy: Always
containers:
- name: nginx-health-containers
image: nginx:1.17.
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep ; rm -rf /run/nginx.pid
readinessProbe:
initialDelaySeconds:
periodSeconds:
successThreshold:
timeoutSeconds:
failureThreshold:
httpGet:
path: /index.html
port:
scheme: HTTP
livenessProbe:
initialDelaySeconds:
periodSeconds:
successThreshold:
timeoutSeconds:
failureThreshold:
tcpSocket:
port:
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m" --- #create service
apiVersion: v1
kind: Service
metadata:
name: nginx-health-svc
namespace: nginx-health-ns
labels:
resource: nginx-svc
spec:
clusterIP: 10.106.189.88
ports:
- port:
protocol: TCP
targetPort:
selector:
app: nginx-health
sessionAffinity: ClientIP
type: ClusterIP

2.创建资源对象

kubectl apply -f nginx-health.yaml
namespace/nginx-health-ns created
deployment.apps/nginx-health-deploy created
service/nginx-health-svc created

3.查看创建的资源对象

k8sops@k8s-master01:/$ kubectl get all -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-health-deploy-6bcc8f7f74-6wc6t / Running 24s 10.244.3.50 k8s-node01 <none> <none>
pod/nginx-health-deploy-6bcc8f7f74-cns27 / Running 24s 10.244.5.52 k8s-node03 <none> <none>
pod/nginx-health-deploy-6bcc8f7f74-rsxjj / Running 24s 10.244.2.42 k8s-node02 <none> <none> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/nginx-health-svc ClusterIP 10.106.189.88 <none> /TCP 25s app=nginx-health NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-health-deploy / 25s nginx-health-containers nginx:1.17. app=nginx-health NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-health-deploy-6bcc8f7f74 25s nginx-health-containers nginx:1.17. app=nginx-health,pod-template-hash=6bcc8f7f74

4.查看Pod状态,当前Pod状态都没有准备并完成状态,准备重启

k8sops@k8s-master01:/$ kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t / Completed 64s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 / Completed 64s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj / Completed 64s 10.244.2.42 k8s-node02 <none> <none>

5.目前已经某个台Pod完成重启,已准备就绪

kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t / Running 73s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 / Running 73s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj / Running 73s 10.244.2.42 k8s-node02 <none> <none>

6.三台Pod都均完成重启,已准备就绪

kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t / Running 85s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 / Running 85s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj / Running 85s 10.244.2.42 k8s-node02 <none> <none>

如果文章有帮到您,记得收藏哦!下面给大家分享一个【超全2020Linux学习教程】,点击链接免费领取

https://www.magedu.com/?p=84301&preview=true

05-18 17:04