问题描述
Heyo,
我已经在GKE v1.16.x中使用kubernetes部署了prometheus,grafana,kube-state-metrics,alertmanager等设置.我使用了 https://github.com/do-community/doks-monitoring作为Yaml文件的起点.
I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.
我已经尝试调试几天了,非常感谢您的帮助.我的Prometheus节点没有从cadvisor获取指标.
I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.
- 部署中的所有服务和Pod正在运行. prometheus,kube状态指标,节点导出程序,所有正在运行-没有错误.
- 普罗米修斯用户界面中的cadvisor目标显示为"up".
- Prometheus能够从群集中收集其他指标,但没有容器/容器级别的使用指标.
- 查询
kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor"
时可以看到cadvisor指标,但是当我在中查找container_cpu_usage
或container_memory_usage
时,没有数据. - 我的cadvisor在Prometheus中抓取作业配置
- All the services and pods in the deployments are running. prometheus, kube-state-metrics, node-exporter, all running - no errors.
- The cadvisor targets in prometheus UI appear as "up".
- Prometheus is able to collect other metrics from the cluster, but no pod/container level usage metrics.
- I can see cadvisor metrics when I query
kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor"
, but when I look in prometheus forcontainer_cpu_usage
orcontainer_memory_usage
, there is no data. - My cadvisor scrape job config in prometheus
- job_name: kubernetes-cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
抄袭了prometheus/docs/examples.
cribbed from the prometheus/docs/examples.
我在路径和刮擦配置上尝试了很多不同的变体,但是没有运气.基于我可以使用kubectl get
(它们存在)查询指标的事实,在我看来,这个问题是普罗米修斯与cadvisor目标进行通信.
I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using kubectl get
(they exist) it seems to me the issue is prometheus communicating with the cadvisor target.
如果有人有配置此配置的经验,我将非常感谢您提供的调试帮助.
If anyone has experience getting this configured I'd sure appreciate some help debugging.
欢呼
推荐答案
太令人沮丧了,过去的几天我一直在挖掘.
Too Frustrating,I've been digging for past few days.
从gke主服务器从1.15.12-gke.2升级到1.16.13-gke.401之后,问题就开始了.
The issue started since after the gke master upgraded from 1.15.12-gke.2 to 1.16.13-gke.401.
要确认这一点,请在另一个gke集群中执行相同的操作,结果是相同的.
To confirm this, did the same in another gke cluster, and result is same.
以上配置禁止使用403.
and above configuration is giving 403 forbidden.
这篇关于Prometheus未从GKE中的cadvisor接收指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!