在Docker容器容器出现错误或CarshLoopBackOff

在Docker容器容器出现错误或CarshLoopBackOff

本文介绍了在Docker容器容器出现错误或CarshLoopBackOff kubernetes时发出警报的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在AWS上安装了kubernetes集群,试图使用cAdvisor + Prometheus + Alert Manager监视多个Pod.如果容器/吊舱掉落或卡在Error或CarshLoopBackOff状态或stcuk除运行之外的任何其他状态下,我要执行的操作将启动电子邮件警报(带有服务/容器名称).

I have my kubernetes cluster setup on AWS where I am trying to monitor several pods, using cAdvisor + Prometheus + Alert manager. What I want to do it launch an email alert (with service/container name) if a container/pod goes down or stuck in Error or CarshLoopBackOff state or stcuk in anyother state apart from running.

推荐答案

Prometheus收集各种指标.例如,您可以使用指标kube_pod_container_status_restarts_total来监视重新启动,这将反映您的问题.

Prometheus collects a wide range of metrics. As an example, you can use a metric kube_pod_container_status_restarts_total for monitoring restarts, which will reflect your problem.

它包含可以在警报中使用的标签:

It containing tags which you can use in the alert:

  • container = container-name
  • namespace = pod-namespace
  • pod = pod-name
  • container=container-name
  • namespace=pod-namespace
  • pod=pod-name

因此,您所需要做的就是配置alertmanager.yaml 配置,方法是添加正确的SMTP设置,收件人和类似的规则:

So, everything you need is to configure your alertmanager.yaml config by adding correct SMTP settings, receiver and rules like that:

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'
  smtp_from: '[email protected]'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'

receivers:
- name: 'team-X-mails'
  email_configs:
  - to: '[email protected]'

# Only one default receiver
route:
  receiver: team-X-mails

# Example group with one alert
groups:
- name: example-alert
  rules:
    # Alert about restarts
  - alert: RestartAlerts
    expr: count(kube_pod_container_status_restarts_total) by (pod-name) > 5
    for: 10m
    annotations:
      summary: "More than 5 restarts in pod {{ $labels.pod-name }}"
      description: "{{ $labels.container-name }} restarted (current value: {{ $value }}s) times in pod {{ $labels.pod-namespace }}/{{ $labels.pod-name }}"

这篇关于在Docker容器容器出现错误或CarshLoopBackOff kubernetes时发出警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 06:50