问题描述
我在AWS上安装了kubernetes集群,试图使用cAdvisor + Prometheus + Alert Manager监视多个Pod.如果容器/吊舱掉落或卡在Error或CarshLoopBackOff状态或stcuk除运行之外的任何其他状态下,我要执行的操作将启动电子邮件警报(带有服务/容器名称).
I have my kubernetes cluster setup on AWS where I am trying to monitor several pods, using cAdvisor + Prometheus + Alert manager. What I want to do it launch an email alert (with service/container name) if a container/pod goes down or stuck in Error or CarshLoopBackOff state or stcuk in anyother state apart from running.
推荐答案
Prometheus收集各种指标.例如,您可以使用指标kube_pod_container_status_restarts_total
来监视重新启动,这将反映您的问题.
Prometheus collects a wide range of metrics. As an example, you can use a metric kube_pod_container_status_restarts_total
for monitoring restarts, which will reflect your problem.
它包含可以在警报中使用的标签:
It containing tags which you can use in the alert:
- container =
container-name
- namespace =
pod-namespace
- pod =
pod-name
- container=
container-name
- namespace=
pod-namespace
- pod=
pod-name
因此,您所需要做的就是配置alertmanager.yaml
配置,方法是添加正确的SMTP设置,收件人和类似的规则:
So, everything you need is to configure your alertmanager.yaml
config by adding correct SMTP settings, receiver and rules like that:
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: '[email protected]'
smtp_auth_username: 'alertmanager'
smtp_auth_password: 'password'
receivers:
- name: 'team-X-mails'
email_configs:
- to: '[email protected]'
# Only one default receiver
route:
receiver: team-X-mails
# Example group with one alert
groups:
- name: example-alert
rules:
# Alert about restarts
- alert: RestartAlerts
expr: count(kube_pod_container_status_restarts_total) by (pod-name) > 5
for: 10m
annotations:
summary: "More than 5 restarts in pod {{ $labels.pod-name }}"
description: "{{ $labels.container-name }} restarted (current value: {{ $value }}s) times in pod {{ $labels.pod-namespace }}/{{ $labels.pod-name }}"
这篇关于在Docker容器容器出现错误或CarshLoopBackOff kubernetes时发出警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!