kube-dns.高可用性. kuberntes中的错误处理

本文介绍了kube-dns.高可用性. kuberntes中的错误处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有多个节点的kubernetes集群.我有在3个节点上运行的kube-dns.

我遇到的问题是，如果这3个节点中的1个发生故障，我的吊舱/容器之间的请求就会开始或多或少地失败3次.

这是因为，当容器解析k8s服务主机名时，它将调用kube-dns服务来解析该主机名，而kube-dns k8s服务具有三个端点，但是这三个端点之一无效，因为该节点已关闭.在检测到节点已关闭之前，K8s不会更新服务. (目前，我将该时间设置为60秒).

关于如何缓解这种情况的任何想法?是否可以在应用程序外部配置任何种类的重试?容器中或k8s级的东西.

谢谢.

解决方案

特定节点上的基础Kubernetes资源与 kube-apiserver 是 kubelet .可以确定其角色为节点代理.因此，kubelet在群集生命周期中起着重要作用，这归因于主要职责，例如管理嵌套对象的活动性和就绪性探针，更新ETCD存储以写入资源元数据以及定期将自身的健康状况刷新为kube-apiserver，由kubelet配置中的--node-status-update-frequency标志指定.

但是，Kubernetes中有一个名为节点控制器的特定组件.节点控制器的基本作用之一是通过控制kubelet的相关心跳来检查所涉及工作人员的状态.有一些描述此行为的特定标志，默认情况下，这些标志已包含在 kube-controller-manager 配置:

--node-monitor-period-在指定时间检查kubelet状态间隔(默认值为5s)；
--node-monitor-grace-period-Kubernetes控制器的时间经理认为Kubelet的健康状态(默认值为40s)；
--pod-eviction-timeout-用于删除Pod上的Pod的宽限超时失败的节点(默认值为5m).

每当您要缓解DNS Pods中断时，如果节点发生故障，则应考虑以下选项.您还可以在以下页面中查看 DNS水平自动缩放器为了与DNS Pod的稳定副本数保持一致，但是它带来了一些额外的逻辑结构，这可能会消耗群集引擎上的更多计算资源.

I have a kubernetes cluster with several nodes. I have kube-dns running in 3 nodes.

The issue I'm having is that if 1 of those 3 nodes goes down the requests between my pods/containers start to fail more or less 1 of 3 times.

This is because when the container resolve a k8s service hostname it calls the kube-dns service to resolve that hostname and the kube-dns k8s services has three endpoints but one of those three endpoints is not valid as the node is down. K8s does not update the service until it detects the node is down. (Currently I have that time set to 60 seconds).

Any idea about how to mitigate this? Is there any kind of retry that could be configured outside the application? Something in the container or at k8s level.

Thank you.

解决方案

The main contributor for communication between underlying Kubernetes resources on the particular Node and kube-apiserver is kubelet. Its role can be determined as a Node agent. Therefore, kubelet plays a significant role in the cluster life cycle, due to primary duties like managing liveness and readiness probes for the nested objects, updating ETCD storage in order to write metadata for the resources and periodically refreshing own health status to kube-apiserver, specified by --node-status-update-frequency flag in kubelet configuration.

However, there is a specific component in Kubernetes called Node controller. One of the essential roles of Node controller is to check the status of the involved workers by controlling relevant heartbeat from kubelet. There are some specific flags that describe this behavior and by default these flags have been included in kube-controller-manager configuration:

--node-monitor-period - Check kubelet status with specified timeinterval (default value 5s);
--node-monitor-grace-period - The time that Kubernetes controllermanager considers healthy status of Kubelet (default value 40s);
--pod-eviction-timeout - The grace timeout for deleting pods onfailed nodes (default value 5m).

Whenever you want to mitigate DNS Pods outage, in case a Node goes down, you should consider these options. You can also take a look at DNS horizontal autoscaller in order to align to stable replica count for DNS Pods, however it brings some additional logic structure to be implemented, which can consume more compute resources on the cluster engine.

这篇关于kube-dns.高可用性. kuberntes中的错误处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！