本文介绍了与Prometheus相关的gke-metrics-agent多重错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在GKE上部署了一个新应用,我发现GKE仪表板在gke-metrics-agent上有成千上万个错误:

I deployed a new app to GKE, I see the GKE dashboard has thousands of errors on gke-metrics-agent:

它占用大量资源.

我检查了日志,并看到了与Prometheus相关的所有错误,但是我没有找到解决这些错误的方法:

I checked the logs, and I saw all errors related to Prometheus, but I didn't find a way to troubleshoot these errors:

集群版本:1.18.12-gke.1206

cluster version:1.18.12-gke.1206

这些错误是什么,我该如何解决?

What are these errors, and how I can fix it?

推荐答案

某些GKE 1.18.12-gke-X 版本似乎存在错误,其中 gke-metrics-agent 会产生很多警告消息.

It looks like some GKE 1.18.12-gke-X versions have bug where gke-metrics-agent produces a lot of Warning messages.

此错误已有 Public Issue Tracker 票证.您可以在此处上关注有关此问题的更新.您还可以使用(+ 1)表示您受到此错误的影响.

There is already a Public Issue Tracker ticket for this bug. You can follow updates regarding this issue here. You can also use (+1) to indicate that you are affected by this bug.

此问题的解决方法是使用较新的版本- 1.18.14-gke.1200 +

Workaround of this issue is to use newer version - 1.18.14-gke.1200+

这篇关于与Prometheus相关的gke-metrics-agent多重错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 02:59