问题描述
我想在监视中处理一种连锁操作。
I would like to handle a kind of chain action in monit.
- 检查进程并立即发出警报。
- 经过多个周期后重新启动进程。
我的尝试(到目前为止):
My tries (so far):
check process myprocess with pidfile /run/my.pid
start program = "/path/to/binary start" with timeout 60 seconds
stop program = "/path/to/binary stop" with timeout 60 seconds
if not exist for 3 cycles then restart
if not exist then alert
if 3 restarts within 3 cycles then timeout
在出现故障的PID时不发出警报并保持运行状态,但在3点后重新启动
Does not alert and keeps in state "running" on failing PID but restarts after the 3 cycles.
check process myprocess with pidfile /run/my.pid
start program = "/path/to/binary start" with timeout 60 seconds
stop program = "/path/to/binary stop" with timeout 60 seconds
if not exist for 3 cycles then restart
if children < 1 for 1 cycles then alert
if 3 restarts within 3 cycles then timeout
没有警报儿童< 1,然后重新启动5。
No alert of children < 1 but restart afer 5.
monit.log
monit.log
[CEST Aug 1 15:09:30] error : 'myprocess' process is not running
监视摘要
Process 'myprocess' Running
此处是监视-v部分:
Existence = if does not exist 3 times within 3 cycle(s) then restart else
if succeeded 1 times within 1 cycle(s) then alert
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Children = if less than 1 1 times within 1 cycle(s) then alert else if
succeeded 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 3 cycle(s) then unmonitor
所以问题是:是否可以发送在1个周期内发出警报并将状态更改为未运行 e并在3之后重新启动?
So the question: is it possible to send an alert and change the status to 'not running' within 1 cycle and restart after 3?
推荐答案
编辑(重要):请参见下面的注释以获取更新的内容(如(2019年2月)。此行为已得到改善。
EDIT (IMPORTANT): See comments below for newer (as per Feb. 2019) versions of Monit, where this behaviour has been improved.
此行:
if does not exist for 3 cycles then restart
表示以下内容:
在您检查3次该服务不存在之后,请重新执行该操作。此行为在monit的文档中被描述为容错能力:
Do not perform any action until you have checked 3 times that the service does not exist, then restart it. This behaviour is described in monit's documentation as Failure Tolerance:
默认情况下,如果动作匹配并且服务将
设置为错误状态,则执行该动作。但是,在触发错误事件并将服务状态更改为
之前,您可能要求测试失败一次超过
。这对于避免收到关于可能发生的虚假错误(
)的警报非常有用。
By default the action is executed if it matches and the service set in an error state. However, you can require a test to fail more than once before the error event is triggered and the service state changed to failed. This is useful to avoid getting alerts on spurious errors, which can happen, especially with network tests.
语法:
周期...或:
[TIMES WITHIN]周期...
[TIMES WITHIN] CYCLES ...
因此,Monit不会更改服务的状态,直到它在下一个X周期内失败为止。为了确认此声明,只需删除此服务的容错能力,然后仅使用:
Accordingly, Monit wont change the service's status until it fails within the next X cycles. In order to confirm this statement, just remove the fault tolerance for this service and use only:
if does not exist then alert
手动停止服务并确认命令
stop manually the service and confirm that the command
monit status
现在显示状态不存在
所以,回到您的问题:
- 是,可以在1个周期内发送警报(按电子邮件)。对于
,您需要为该服务定义选项如果不存在,则发出警报,并正确设置电子邮件警报。假设您希望
使用外部电子邮件服务器,则需要定义至少两个
行(使用gmail的配置示例):
SMTP服务器配置
set mailserver smtp.gmail.com PORT 587 USERNAME "[email protected]" PASSWORD "xxxxx" using TLSV1 with timeout 30 seconds
(Be请注意,在gmail中,您必须激活不安全应用程序的访问权限,以允许monit使用stmp服务)
(Be aware that in gmail you must activate the access for "unsecure" apps in order to allow monit to use the stmp service)
和
电子邮件接收者
set alert [email protected]
都在文件/ etc / monit / monitrc中。有关这两行的更多信息,请参考官方文档。
both in the file /etc/monit/monitrc. Refer to the official documentation for more information about these two lines.
- 据文档所述,并非如果定义了容错(X周期后执行操作),则可以立即更新服务状态。但是您仍然可以定义要立即发送的警报,并在所需的周期内重新启动服务。
参考:
Monit的文档:
Monit's documentation: https://mmonit.com/monit/documentation/monit.html
希望它会有所帮助!
问候
这篇关于让monit先发出警报,然后再重新启动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!