问题描述
我在 exit 和 _exit 上设置了断点,我的程序(在 linux 2.6.16.46-0.12 sles10 上运行的多线程应用程序)不知何故仍在以我无法定位的方式退出
(gdb) c...[新线程 47513671297344 (LWP 15279)][新线程 47513667103040 (LWP 15280)][新线程 47513662908736 (LWP 15281)]程序以代码 0177 退出.(gdb)退出函数驻留在 libc 中,因此不存在延迟加载共享库问题.有谁知道其他无法捕捉到的神秘退出触发器?
问题现在只是学术问题.我尝试了二进制搜索调试,取消了我的一部分更改(问题消失了).在我按顺序再次应用它们后,即使恢复到原始状态,我也无法再重现问题.
我最近发现了这种错误的一个原因,这可能是这个问题的原始来源.由于历史原因,我们的产品使用了邪恶的链接器标志 -Bsymbolic.这样做的副作用之一是,当一个符号未定义但被调用时,GLIBC 运行时链接器将以这种方式轰炸,并且您在调试器中看到它作为以 0177 退出的进程.当运行时链接器以这种方式中止时,我'd 猜测它会直接调用 _exit(而不是使用 C 运行时库 exit() 或 _exit()).这与我无法通过调试器中的退出断点捕获这一事实是一致的.
_exit
断点错过"的常见原因有两个——或者 GDB
没有在正确的位置设置断点,或者程序执行(道德等价于)syscall(SYS_exit, ...)
info break
和 disassemble _exit
怎么说?
您也许可以说服 GDB
使用 break *&_exit
正确设置断点.或者,GDB-7.0
支持 catch syscall
.无论程序如何退出,这样的东西都应该可以工作(假设 Linux/x86_64
;请注意,在 ix86
上的数字会有所不同):
(gdb) 捕获系统调用 60捕获点 3(系统调用退出"[60])(gdb) 捕获系统调用 231捕获点 4(系统调用exit_group"[231])(gdb) c捕获点 4(调用 syscall 'exit_group'),来自/lib/libc.so.6 的 _exit () 中的 0x00007ffff7912f3d
更新:
您的评论表明 _exit 断点设置正确,因此您的进程很可能没有执行 _exit
.
剩下 syscall(SYS_exit, ...)
和另一种可能性(我之前错过了):所有线程都在执行 pthread_exit
.您可能还想在 pthread_exit
上设置一个断点(并在每次点击它时执行 info thread
- 最后一个线程执行 pthread_exit
将导致进程终止).
另外值得注意的是,您可以使用助记符名称,而不是系统调用编号.您还可以同时将多个系统调用添加到捕获列表中,如下所示:
(gdb) 捕获系统调用退出 exit_group捕获点 2(系统调用退出"[1]退出组"[252])
I've set breakpoints on exit and _exit and my program (multithreaded app, running on linux 2.6.16.46-0.12 sles10), is somehow still exiting in a way I can't locate
(gdb) c ... [New Thread 47513671297344 (LWP 15279)] [New Thread 47513667103040 (LWP 15280)] [New Thread 47513662908736 (LWP 15281)] Program exited with code 0177. (gdb)
the exit functions reside in libc so there's no deferred load shared library issues. Anybody know of some other mysterious trigger for exit that can't be caught?
EDIT: the problem is now academic only. I tried binary search debugging, backing out a subset of my changes (the problem went away). After I applied them again in sequence, I can no longer repro the problem, even with things restored to the original state.
EDIT2: I found one reason for this sort of error recently, which may have been the original source for this problem. For historical reasons our product uses the evil linker flag -Bsymbolic. Among the side effects of this is that when a symbol is undefined but called, the GLIBC runtime linker will bomb in exactly this way, and you see it in the debugger as a process exited with 0177. When the runtime linker aborts this way, I'd guess it makes the syscall to _exit directly (rather than using the C runtime library exit() or _exit()). That would be consistent with the fact that I was unable to catch this with an the exit breakpoints in the debugger.
There are two common reasons for _exit
breakpoint to "miss" -- either GDB
didn't set the breakpoint in the right place, or the program performs (a moral equivalent of) syscall(SYS_exit, ...)
What do info break
and disassemble _exit
say?
You might be able to convince GDB
to set the breakpoint correctly with break *&_exit
. Alternatively, GDB-7.0
supports catch syscall
. Something like this should work (assuming Linux/x86_64
; note that on ix86
the numbers will be different) regardless of how the program exits:
(gdb) catch syscall 60
Catchpoint 3 (syscall 'exit' [60])
(gdb) catch syscall 231
Catchpoint 4 (syscall 'exit_group' [231])
(gdb) c
Catchpoint 4 (call to syscall 'exit_group'), 0x00007ffff7912f3d in _exit () from /lib/libc.so.6
Update:
Your comment indicates that _exit breakpoint is set correctly, so it's likely that your process just doesn't execute _exit
.
That leaves syscall(SYS_exit, ...)
and one other possibility (which I missed before): all threads executing pthread_exit
. You might want to set a breakpoint on pthread_exit
as well (and execute info thread
each time you hit it -- the last thread to do pthread_exit
will cause the process to terminate).
Edit:
Also worth noting that you can use mnemonic names, rather than syscall numbers. You can also simultaneously add multiple syscalls to the catch list like so:
(gdb) catch syscall exit exit_group
Catchpoint 2 (syscalls 'exit' [1] 'exit_group' [252])
这篇关于设置 gdb 退出断点不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!