本文介绍了即使在添加其他Kubernetes节点之后,我也会看到新的节点未使用,同时收到错误消息``没有与所有谓词匹配的节点可用:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们尝试在计划在4个节点和1个主节点集群上部署的现有Pod组合中再添加一个部署,其中包含2个Pod.我们收到以下错误:没有与所有谓词匹配的节点可用:cpu不足(4),内存不足(1),PodToleratesNodeTaints(2).

We tried to add one more deployment with 2 pods to existing mix of pods scheduled over 4 nodes and 1 master node cluster. We are getting following error:No nodes are available that match all of the predicates: Insufficient cpu (4), Insufficient memory (1), PodToleratesNodeTaints (2).

查看其他线程和文档,当现有节点超过cpu容量(在4个节点上)和内存容量(在1个节点上)时会发生这种情况.

Looking at the other threads and documentation, this would be the case when existing nodes are exceeding cpu capacity (on 4 nodes) and memory capacity(on 1 node)...

为解决资源问题,我们添加了另一个节点并重新部署了这些位.但是仍然看到相同的问题,并且看到几乎未使用的节点. (请参阅下面的节点5,但当节点2和节点4被过度分配时,在添加失败的新Pod之后,节点1和3将被过度分配)

To solve the resource issue, we added another node and redeployed the bits. But still see the same issues and see almost unused node. (see node-5 below being not used while node-2 and node-4 are over allocated, node 1 and 3 would be overallocated after addition of the new pods which are failing)

node-5 | 0.11(5.50%)| 0(0.00%)| 50英里(1.26%)| 50英里(1.26%)| 3 小时

node-5 | 0.11 (5.50%) | 0 (0.00%) | 50 Mi (1.26%) | 50 Mi (1.26%) | 3 hours

node-4 | 1.61(80.50%)| 2.8 (140.00%) | 2.674 Gi(69.24%)| 4.299吉 (111.32%) | 7天

node-4 | 1.61 (80.50%) | 2.8 (140.00%) | 2.674 Gi (69.24%) | 4.299 Gi (111.32%) | 7 days

node-3 | 1.47(73.50%)| 1.7 (85.00%) | 2.031 Gi(52.60%)| 2.965吉 (76.78%)| 7个月

node-3 | 1.47 (73.50%) | 1.7 (85.00%) | 2.031 Gi (52.60%) | 2.965 Gi (76.78%) | 7 months

node-2 | 1.33(66.50%)| 2.1 (105.00%) | 2.684 Gi(69.49%)| 3.799吉 (98.37%)| 7个月

node-2 | 1.33 (66.50%) | 2.1 (105.00%) | 2.684 Gi (69.49%) | 3.799 Gi (98.37%) | 7 months

node-1 | 1.48(74.00%)| 1.4 (70.00%) | 1.705 Gi(44.15%)| 2.514焦耳 (65.09%)| 7个月

node-1 | 1.48 (74.00%) | 1.4 (70.00%) | 1.705 Gi (44.15%) | 2.514 Gi (65.09%) | 7 months

主| 0.9(45.00%)| 0.1(5.00%)| 350英里(8.85%)| 300英里(7.59%) | 7个月

master | 0.9 (45.00%) | 0.1 (5.00%) | 350 Mi (8.85%) | 300 Mi (7.59%) | 7 months

请注意,我们已启用自动缩放(限制为8个节点). (客户端版本为v1.9.0,而我们的kubernetes服务器版本为v1.8.4).我们正在使用头盔进行部署,并使用kops来添加新节点.

Note that We have auto scaling enabled (with limit of 8 nodes). (client version is v1.9.0 while our kubernetes server version is v1.8.4). We are using helm to deploy and using kops to add new node.

为什么未安排pod的时间,以使每个节点的容量都低于容量?为什么会看到错误和一个完全未使用的节点?

Why the pods are not scheduled so that each node can be below capacity? Why are we seeing errors and one fully unused node?

推荐答案

弄清楚发生了什么.这是我们认为发生了什么...

Figured out what was going on. Here is we think what happened...

  1. 我们使用kops添加了一个新节点(第5个).
  2. 当时我们正在运行的集群自动缩放器的节点设置为最小4和最大8.因此,很可能它发现该节点没有用,并向其添加了污点,如下所示:
  1. 因此,即使我们尝试部署和重新部署服务,也不会因为此污点而将任何Pod调度到该节点.

然后,我们使用最小= 5和最大= 8的新值重新部署自动缩放器.

We then redeployed the autoscaler with new values of min = 5 and max = 8.

然后我们移除了此污点并重新部署,第5个节点未被利用的问题就消失了.因此,现在有足够的节点资源,因此我们没有收到所得到的错误.

Then we removed this taint and redeployed, the issue of that 5th node not being leveraged went away. And hence now there were enough node resources because of which we didn't get the error we were getting.

不确定为什么自动缩放器会用此污点标记新节点.这是另一天的问题,或者可能是k8s自动定标器中的错误.但是,该问题已通过在该新节点上删除该污点而得以解决.

Not sure why autoscaler marked the new node with this taint. That is a question for some other day or may be bug in k8s autoscaler. But the issue was fixed with removal of that taint on that new node.

这篇关于即使在添加其他Kubernetes节点之后,我也会看到新的节点未使用,同时收到错误消息``没有与所有谓词匹配的节点可用:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 14:46