问题描述
是的,以前已经问过这个问题,但是阅读答案并没有给我带来很多启发。
Yes, this question has been asked before, but reading the answers didn't enlighten me much.
我写了一个C程序,几天后它崩溃了用。重要的一点是,即使设置了所有内容,它也不会生成核心文件(core_pattern,ulimit -c unlimited等。我可以使用kill -SIGQUIT触发核心转储)。
I wrote a C program that crashes after a few days of use. An important point is that it does NOT generate a core file, even though everything is set up so that it should (core_pattern, ulimit -c unlimited, etc. I can trigger a core dump fine with kill -SIGQUIT).
程序会广泛记录其操作,但是日志中没有崩溃提示。
在崩溃时(或之前?)显示的唯一消息是:
The programs extensively logs what it does, but there's no hint about the crash in the log.The only message displayed at the crash (or before?) is:
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
after 2322 requests (2322 known processed) with 0 events remaining.
所以有两个问题:
-程序如何崩溃(返回$ ?= 1)没有核心转储。
-此错误消息是什么,我该怎么办?
So two questions:- how is it possible for a program to crash (return $?=1) without core dump.- what is this error message about and what can I do ?
系统是RedHat Enterprise 6.4
System is RedHat Enterprise 6.4
编辑:
我设法通过从atexit()回调内部调用abort()来强制进行核心转储:
I managed to force a core dump by calling abort() from inside an atexit() callback:
(gdb) bt
#0 0x00bc8424 in __kernel_vsyscall ()
#1 0x0085a861 in raise () from /lib/libc.so.6
#2 0x0085c13a in abort () from /lib/libc.so.6
#3 0x0808f5cf in Unexpected () at MyCode.c:1378
#4 0x0085de9f in exit () from /lib/libc.so.6
#5 0x00c85701 in _XDefaultIOError () from /usr/lib/libX11.so.6
#6 0x00c85797 in _XIOError () from /usr/lib/libX11.so.6
#7 0x00c84055 in _XReply () from /usr/lib/libX11.so.6
#8 0x00c68b8f in XGetImage () from /usr/lib/libX11.so.6
#9 0x004fd6a7 in ?? () from /usr/local/lib/libcvi.so
#10 0x00478ad5 in ?? () from /usr/local/lib/libcvi.so
...
#29 0x001eed9d in ?? () from /usr/local/lib/libcvi.so
#30 0x001eee41 in RunUserInterface () from /usr/local/lib/libcvi.so
#31 0x0808fab4 in main (argc=2, argv=0xbfbdc984) at MyCode.c:1540
任何人都可以启发我这个X11问题吗? libcvi.so不是我的,只有MyCode.c(LabWindows / CVI)。
Anyone can enlighten me as to this X11 problem ? libcvi.so is not mine, only MyCode.c (LabWindows/CVI).
编辑2014-12-05:
这是一个更为精确的回溯。事情肯定在X11中发生,但是我不是X11程序员,所以从提供的行中查看X的源代码只能告诉我X服务器(?)暂时不可用。如果只是暂时的,有什么方法可以简单地告诉它忽略此错误?
Edit 2014-12-05:Here's an even more precise backtrace. Things definitely happen in X11, but I'm no X11 programmer, so looking at the source code for X from the provided linestell me only that the X server (?) is temporarily unavailable. Is there any way to simply tell it to ignore this error if it's only temporary ?
#4 0x00965eaf in __run_exit_handlers (status=1) at exit.c:78
#5 exit (status=1) at exit.c:100
#6 0x00c356b1 in _XDefaultIOError (dpy=0x88aeb80) at XlibInt.c:1292
#7 0x00c35747 in _XIOError (dpy=0x88aeb80) at XlibInt.c:1498
#8 0x00c340a6 in _XReply (dpy=0x88aeb80, rep=0xbf82fa90, extra=0, discard=0) at xcb_io.c:708
#9 0x00c18c0f in XGetImage (dpy=0x88aeb80, d=27263845, x=0, y=0, width=60, height=20, plane_mask=4294967295, format=2) at GetImage.c:75
#10 0x005f46a7 in ?? () from /usr/local/lib/libcvi.so
对应行:
XlibInt.c: _XDefaultIOError()
1292: exit(1);
XlibInt.c: _XIOError
1498: _XDefaultIOError(dpy);
xcb_io.c: _XReply()
708: if(!reply) _XIOError(dpy);
GetImage.c: XGetImage()
74: if (_XReply (dpy, (xReply *) &rep, 0, xFalse) == 0 || ...
推荐答案
好,我终于找到了原因(感谢National Instruments的某人) ),更好的诊断和解决方法。
OK, I finally found the cause (thanks to someone at National Instruments), a better diagnostic and a workaround.
该错误存在于libxcb的许多版本中,并且是32位计数器翻转问题,这一问题已经有几年了:
The bug is in many versions of libxcb and is a 32-bit counter rollover problem that has been known for a few years: https://bugs.freedesktop.org/show_bug.cgi?id=71338
并非所有版本的libxcb都受影响,libxcb-1.9-5拥有此功能,libxcb-1.5-1则没有。从错误列表中,不应该使用64位操作系统。不会受到影响,但我设法在至少一个版本上触发了它。
Not all versions of libxcb are affected libxcb-1.9-5 has it, libxcb-1.5-1 doesn't. From the bug list, 64-bits OS shouldn't be affected, but I managed to trigger it on at least one version.
这使我有了更好的诊断方法,以下程序将在不到15分钟的时间内崩溃受影响的库(比以前花费了整整一周的时间):
Which brings me to a better diagnostic. The following program will crash in less than 15 minutes on affected libraries (better than the entire week it previously took):
// Compile with: gcc test.c -lX11 && time ./a.out
#include <X11/Xlib.h>
void main(void) {
Display *d = XOpenDisplay(NULL);
if (d)
for(;;)
XNoOp(d);
}
最后一件事,上面的prog编译并运行在64位上系统可以正常工作,可以在旧的32位系统上运行并编译也可以,但是如果我将32位版本转移到64位系统,则几分钟后它就会崩溃。
And one final thing, the above prog compiled and ran on a 64-bit system works fine, compiled and ran on an old 32-bit system also works fine, but if I transfer the 32-bit version to the 64-bit system, it crashes after a few minutes.
这篇关于XIO:由32位libxcb导致的致命IO错误11的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!