问题描述
我当前有一个Prometheus警报,当我的成功率下降到85%以下时会触发。
I currently have a Prometheus alert that fires when my success rate drops below 85%.
我想将比率的绝对数字添加到警报描述中。我该怎么办?
I would like to add the absolute numbers of the ratio to the alert description. How do I do that?
我的YAML当前看起来像这样(我清理了一些无关的细节):
My YAML currently looks like this (I cleaned up some extraneous details):
groups:
- name: recording_rules
rules:
- record: number_of_successes_24h
expr: avg(sum by(instance)(my_status{kubernetes_name="my-prom",timeRange="1d",status=~"success"}))
- record: number_of_total_24h
expr: avg(sum by(instance)(my_status{kubernetes_name="my-prom",timeRange="1d"}))
- record: success_rate_24h
expr: clamp_max(number_of_successes_24h / number_of_total_24h * 100, 100)
- name: alerting_rules
rules:
- alert: LowSuccessRate24H
expr: success_rate_24h < 85
labels:
severity: critical
annotations:
summary: "CRITICAL: Low success rate 24h"
description: "Success rate in the last 24 hours went below 85% (value: {{ $value }}%)"
我的问题是,我该怎么办将 number_of_successes_24h
和 number_of_total_24h
添加到说明中?
我阅读了并获取任一结果(您可以将与
一起使用)或多个值(您可以使用 range
进行迭代)。然后您可以直接打印时间序列值或某些标签(例如实例名称)。
The idea is that you use Go templates to generate a query (by populating a template with values from $labels
using printf
) and then pipe that into the Prometheus-defined query
function and get back either one result (that you can handle using with
) or multiple values (that you can iterate over using range
). Then you can print either the timeseries value directly or some label (e.g. the instance name).
这篇关于如何使Prometheus警报描述同时给出比率和绝对数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!