本文介绍了如何使Prometheus警报描述同时给出比率和绝对数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前有一个Prometheus警报,当我的成功率下降到85%以下时会触发。

I currently have a Prometheus alert that fires when my success rate drops below 85%.

我想将比率的绝对数字添加到警报描述中。我该怎么办?

I would like to add the absolute numbers of the ratio to the alert description. How do I do that?

我的YAML当前看起来像这样(我清理了一些无关的细节):

My YAML currently looks like this (I cleaned up some extraneous details):

groups:
  - name: recording_rules
    rules:
    - record: number_of_successes_24h
      expr: avg(sum by(instance)(my_status{kubernetes_name="my-prom",timeRange="1d",status=~"success"}))
    - record: number_of_total_24h
      expr: avg(sum by(instance)(my_status{kubernetes_name="my-prom",timeRange="1d"}))
    - record: success_rate_24h
      expr: clamp_max(number_of_successes_24h / number_of_total_24h * 100, 100)

  - name: alerting_rules
    rules:
    - alert: LowSuccessRate24H
      expr: success_rate_24h < 85
      labels:
        severity: critical
      annotations:
        summary: "CRITICAL: Low success rate 24h"
        description: "Success rate in the last 24 hours went below 85% (value: {{ $value }}%)"

我的问题是,我该怎么办将 number_of_successes_24h number_of_total_24h 添加到说明中?

我阅读了并获取任一结果(您可以将一起使用)或多个值(您可以使用 range 进行迭代)。然后您可以直接打印时间序列值或某些标签(例如实例名称)。

The idea is that you use Go templates to generate a query (by populating a template with values from $labels using printf) and then pipe that into the Prometheus-defined query function and get back either one result (that you can handle using with) or multiple values (that you can iterate over using range). Then you can print either the timeseries value directly or some label (e.g. the instance name).

这篇关于如何使Prometheus警报描述同时给出比率和绝对数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 00:25