前几天在海外UCloud机器上部署了一套zabbix proxy和zabbix agentd,可是第二天一大早就收到邮件说zabbix_proxy挂掉了,上去查一下发现两台机器中的一台的proxy和agentd都挂了,而另一台没事,再查一下log日志:

zabbix_agentd []: [file:'cpustat.c',line:] lock failed: [] Invalid argument
::022001.966 One child process died (PID:,exitcode/signal:). Exiting ...
::022003.967 Zabbix Agent stopped. Zabbix 2.0. (revision ). zabbix_proxy []: [file:'selfmon.c',line:] lock failed: [] Invalid argument
zabbix_proxy []: [file:'selfmon.c',line:] lock failed: [] Invalid argument
zabbix_proxy []: [file:'selfmon.c',line:] lock failed: [] Invalid argument
::022001.362 One child process died (PID:,exitcode/signal:). Exiting ...
::022003.365 syncing history data...
zabbix_proxy []: [file:'dbcache.c',line:] lock failed: [] Invalid argument

第一感觉就是crontab跑了一个什么脚本,删除了啥东西导致的,果不其然,的确是删除了信号量导致的(关于信号量的介绍参看大牛博客 ipcs介绍 ),删除脚本如下:

#!/bin/sh
for semid in `ipcs -s | cut -f2 -d" "`
do
ipcrm -s $semid
done

这么粗暴的删除,不出事才怪呢,加个删除条件:

#!/bin/sh
for semid in `ipcs -s | grep -v zabbix | cut -f2 -d" "`
do
ipcrm -s $semid
done

再跑一下脚本,没问题啦 ^_^

05-11 13:52