在服务器上,概率(一个月1次)出现,某个服务停摆了,日志上看,没有任何报错,而且服务进程也在,只是无响应了。
问题分析:
1,用strace命令查看,发现当前进程阻塞在系统API上, futext_wait_queue_me
2, 用gdb命令 attach 当前阻塞进程,bt显示 call trace,发现是syslog系统调用出现了死锁。
3,进一步分析,发现,某个函数在外部流程和信号处理函数(定时任务)中都调到了,而这个函数内部,调用了syslog输出日志。
4,结论:服务进程和信号处理函数同时调用syslog,可能导致死锁。
问题验证:
点击(此处)折叠或打开
- #include <errno.h>
- #include <syslog.h>
- #include <sys/types.h>
- #include <sys/socket.h>
- #include <netinet/in.h>
- #include <sys/select.h>
- #include <unistd.h>
- #include <signal.h>
- #include <sys/types.h>
- #include <sys/stat.h>
- #include <fcntl.h>
- #include <stdio.h>
- #include <sys/time.h>
- #include <sys/un.h>
- int idx = 0;
- static void sigroutine(int signo)
- {
- char* logcnt = "this log from sigroutine,************************************************************************************************************************************************************ end";
- switch (signo)
- {
- case SIGALRM:
- idx = 0;
- syslog(LOG_DEBUG, "%s\n", logcnt);
- break;
- }
- return;
- }
- static void srv_heartbeat_start(void)
- {
- struct itimerval value, old_value;
- signal(SIGALRM, sigroutine);
- value.it_value.tv_sec = 0;
- value.it_value.tv_usec = 50000;
- value.it_interval.tv_sec = 0;
- value.it_interval.tv_usec = 50000;
- setitimer(ITIMER_REAL, &value, &old_value);
- }
- int main(int argc, char** argv)
- {
- char* logcnt = "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++";
- srv_heartbeat_start();
- while (1)
- {
- syslog(LOG_INFO, "--> %d\n", idx++);
- syslog(LOG_DEBUG, "%s\n", logcnt);
- }
- }
如上代码,主服务一直循环输入log,起一个定时器,每50ms 发送一个时钟信号,并且在时钟信号处理函数中打印一条日志。以上程序必须syslog死锁的问题。gdb 打印堆栈如下:
#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007f73854ec344 in _L_lock_533 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f73854ebda2 in __GI___vsyslog_chk (pri=, flag=flag@entry=-1, fmt=0x400802 "%s\n", ap=ap@entry=0x7ffcb7e84c38) at ../misc/syslog.c:258
#3 0x00007f73854ec15f in __syslog (pri=, fmt=) at ../misc/syslog.c:117
#4 0x000000000040060f in sigroutine (signo=14) at syslog_lock.c:40
#5
#6 0x00007f73854f30bb in __libc_send (fd=3, buf=0x1a95270, n=33, flags=-2058407747, flags@entry=16384) at ../sysdeps/unix/sysv/linux/x86_64/send.c:27
#7 0x00007f73854ebf31 in __GI___vsyslog_chk (pri=, flag=flag@entry=-1, fmt=0x4008b3 "--> %d\n", ap=ap@entry=0x7ffcb7e853e8) at ../misc/syslog.c:279
#8 0x00007f73854ec15f in __syslog (pri=, fmt=) at ../misc/syslog.c:117
#9 0x00000000004006a2 in main (argc=1, argv=0x7ffcb7e855c8) at syslog_lock.c:65