在Linux中调试段错误(core dumped)

  • 在作比赛的时候经常遇到段错误, 但是一般都采用的是printf打印信息这种笨方法,而且定位bug比较慢,今天尝试利用gdb工具调试段错误.
  • 段错误(core dumped)一般都是数组索引位置不对,或者是数组越界等问题造成,在Linux环境下编程应该很容易就会遇到.

GDB调试的具体流程

什么是段错误Segmentation fault (core dumped)

  • 段错误一般是指程序尝试访问它不被允许访问的内存地址,可能会被一下情况导致:
    • 试图访问(dereference)一个空指针, 系统不允许访问地址为0的内存空间;
    • 试图访问一个不在自己内存访问范围内的一个指针;
    • 在C++程序中, 一个类的vtable(虚指针的列表)被占用, 而且指向了一个错误的地方, 导致程序试图去执行一个没有运行权限的地址;
    • 未内存对齐的程序访问也可能导致段错误.

valgrind简单工具进行调试

  • valgrind可以跟踪程序的一些堆栈信息, 使用之前必须利用sudo apt-get install valgrind进行安装该命令行工具.
  • 然后通过valgrind -v 可执行程序名字追踪有问题的二进制可执行程序.
  • 下面是段错误程序的显示结果:
$ valgrind -v ./bin/CodeCraft-2019
==19578== Memcheck, a memory error detector
==19578== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19578== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==19578== Command: ./bin/CodeCraft-2019
==19578==
--19578-- Valgrind options:
--19578--    -v
--19578-- Contents of /proc/version:
--19578--   Linux version 4.15.0-46-generic (buildd@lgw01-amd64-038) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019
--19578--
--19578-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-avx
--19578-- Page sizes: currently 4096, max supported 4096
--19578-- Valgrind library directory: /usr/lib/valgrind
--19578-- Reading syms from /usrdata/applications/huawei2019/03-28-01-coredump/bin/CodeCraft-2019
--19578-- Reading syms from /lib/x86_64-linux-gnu/ld-2.27.so
--19578--   Considering /lib/x86_64-linux-gnu/ld-2.27.so ..
--19578--   .. CRC mismatch (computed 1b7c895e wanted 2943108a)
--19578--   Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.27.so ..
--19578--   .. CRC is valid
--19578-- Reading syms from /usr/lib/valgrind/memcheck-amd64-linux
--19578--   Considering /usr/lib/valgrind/memcheck-amd64-linux ..
--19578--   .. CRC mismatch (computed c25f395c wanted 0a9602a8)
--19578--    object doesn't have a symbol table
--19578--    object doesn't have a dynamic symbol table
--19578-- Scheduler: using generic scheduler lock implementation.
--19578-- Reading suppressions file: /usr/lib/valgrind/default.supp
==19578== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-19578-by-jl-on-???
==19578== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-19578-by-jl-on-???
==19578== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-19578-by-jl-on-???
==19578==
==19578== TO CONTROL THIS PROCESS USING vgdb (which you probably
==19578== don't want to do, unless you know exactly what you're doing,
==19578== or are doing some strange experiment):
==19578==   /usr/lib/valgrind/../../bin/vgdb --pid=19578 ...command...
==19578==
==19578== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==19578==   /path/to/gdb ./bin/CodeCraft-2019
==19578== and then give GDB the following command
==19578==   target remote | /usr/lib/valgrind/../../bin/vgdb --pid=19578
==19578== --pid is optional if only one valgrind process is running
==19578==
--19578-- REDIR: 0x401f2f0 (ld-linux-x86-64.so.2:strlen) redirected to 0x58060901 (???)
--19578-- REDIR: 0x401f0d0 (ld-linux-x86-64.so.2:index) redirected to 0x5806091b (???)
--19578-- Reading syms from /usr/lib/valgrind/vgpreload_core-amd64-linux.so
--19578--   Considering /usr/lib/valgrind/vgpreload_core-amd64-linux.so ..
--19578--   .. CRC mismatch (computed 4b63d83e wanted 670599e6)
--19578--    object doesn't have a symbol table
--19578-- Reading syms from /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
--19578--   Considering /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so ..
--19578--   .. CRC mismatch (computed a4b37bee wanted 8ad4dc94)
--19578--    object doesn't have a symbol table
==19578== WARNING: new redirection conflicts with existing -- ignoring it
--19578--     old: 0x0401f2f0 (strlen              ) R-> (0000.0) 0x58060901 ???
--19578--     new: 0x0401f2f0 (strlen              ) R-> (2007.0) 0x04c32db0 strlen
--19578-- REDIR: 0x401d360 (ld-linux-x86-64.so.2:strcmp) redirected to 0x4c33ee0 (strcmp)
--19578-- REDIR: 0x401f830 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4c374f0 (mempcpy)
--19578-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25
--19578--    object doesn't have a symbol table
--19578-- Reading syms from /lib/x86_64-linux-gnu/libgcc_s.so.1
--19578--    object doesn't have a symbol table
--19578-- Reading syms from /lib/x86_64-linux-gnu/libc-2.27.so
--19578--   Considering /lib/x86_64-linux-gnu/libc-2.27.so ..
--19578--   .. CRC mismatch (computed b1c74187 wanted 042cc048)
--19578--   Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.27.so ..
--19578--   .. CRC is valid
--19578-- Reading syms from /lib/x86_64-linux-gnu/libm-2.27.so
--19578--   Considering /lib/x86_64-linux-gnu/libm-2.27.so ..
--19578--   .. CRC mismatch (computed 7feae033 wanted b29b2508)
--19578--   Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.27.so ..
--19578--   .. CRC is valid
--19578-- REDIR: 0x547bc70 (libc.so.6:memmove) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547ad40 (libc.so.6:strncpy) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bf50 (libc.so.6:strcasecmp) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547a790 (libc.so.6:strcat) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547ad70 (libc.so.6:rindex) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547d7c0 (libc.so.6:rawmemchr) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bde0 (libc.so.6:mempcpy) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bc10 (libc.so.6:bcmp) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547ad00 (libc.so.6:strncmp) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547a800 (libc.so.6:strcmp) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bd40 (libc.so.6:memset) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x54990f0 (libc.so.6:wcschr) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547aca0 (libc.so.6:strnlen) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547a870 (libc.so.6:strcspn) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bfa0 (libc.so.6:strncasecmp) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547a840 (libc.so.6:strcpy) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547c0e0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547ada0 (libc.so.6:strpbrk) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547a7c0 (libc.so.6:index) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547ac70 (libc.so.6:strlen) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x54856c0 (libc.so.6:memrchr) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bff0 (libc.so.6:strcasecmp_l) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bbe0 (libc.so.6:memchr) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x5499eb0 (libc.so.6:wcslen) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547b050 (libc.so.6:strspn) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bf20 (libc.so.6:stpncpy) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547bef0 (libc.so.6:stpcpy) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547d7f0 (libc.so.6:strchrnul) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x547c040 (libc.so.6:strncasecmp_l) redirected to 0x4a2a6e0 (_vgnU_ifunc_wrapper)
--19578-- REDIR: 0x548e330 (libc.so.6:__strrchr_sse2) redirected to 0x4c32790 (__strrchr_sse2)
--19578-- REDIR: 0x5474070 (libc.so.6:malloc) redirected to 0x4c2faa0 (malloc)
--19578-- REDIR: 0x548e620 (libc.so.6:__strlen_sse2) redirected to 0x4c32d30 (__strlen_sse2)
--19578-- REDIR: 0x556cfc0 (libc.so.6:__memcmp_sse4_1) redirected to 0x4c35d50 (__memcmp_sse4_1)
--19578-- REDIR: 0x5486e70 (libc.so.6:__strcmp_sse2_unaligned) redirected to 0x4c33da0 (strcmp)
Begin
--19578-- REDIR: 0x5498440 (libc.so.6:__mempcpy_sse2_unaligned) redirected to 0x4c37130 (mempcpy)
please input args: carPath, roadPath, crossPath, answerPath
--19578-- REDIR: 0x5498870 (libc.so.6:__memset_sse2_unaligned) redirected to 0x4c365d0 (memset)
--19578-- REDIR: 0x5474950 (libc.so.6:free) redirected to 0x4c30cd0 (free)
==19578==
==19578== HEAP SUMMARY:
==19578==     in use at exit: 0 bytes in 0 blocks
==19578==   total heap usage: 2 allocs, 2 frees, 73,728 bytes allocated
==19578==
==19578== All heap blocks were freed -- no leaks are possible
==19578==
==19578== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==19578== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

怎么才能获得core dump文件

  • 一个core dump文件是程序运行时的一份内存拷贝, 通过这个文件可以调试程序找到出bug的地方;
  • 当程序程序出现了段错误时, Linux内核会根据配置情况将一个core dump文件写入到硬盘中.
  • Linux用ulimit设置连接数的最大值, ulimit只能做临时修改,重启后失效:
    • ulimit -c 设置core文件的最大值, 单位为区块;
    • ulimit -a 显示目前资源限制的设定.
    • 利用ulimit -c unlimited将core文件设置为无限大.
  • 不能产生core文件的原因:
    • 没有足够内存空间;
    • 禁用了core文件的创建;
    • 设置一个进程当前目录没有写文件的的权限;
  • 利用命令sudo sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t设置内核产生core文件的形式和位置, 放于/tmp目录并且显示时间戳.
    • 当程序出现段错误的时候, linux内核会自动地在/tmp目录保存一个core文件.
  • 利用cat /proc/PID/limit也可以显示一个进程中的core文件的大小限制.
  • kernel.core_pattern表示coredumps文件放于什么地方,它是一个内核参数,可以通过sysctl进行查看和进行控制:
    • sysctl -a表示查看内核的所有参数, 或使用sysctl kernel.core_pattern显示kernel.core_pattern的参数.

通过GDB工具对生成的core文件进行回溯追踪

  • 通过命令gdb -c my_core_file打开一个名为my_core_file的文件.
  • 调试我的coredump的程序结果如下:
sudo gdb -c /tmp/core-CodeCraft-2019.23637.jl.1554030516
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
[New LWP 23637]
Core was generated by `./bin/CodeCraft-2019 ../1-map-training-1/car.txt ../1-map-training-1/road.txt .'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000556393da88ad in ?? ()
(gdb)
  • 可以看到, 该程序在执行过程中接收到了一个SIGSEGV信号, 该信号表示一个进程执行了一个无效的内存引用, 或发生了段错误.
  • 然后在gdb工具中不停的bt找到出现段错误在程序的多少行和真正引起段错误的原因.
    • bt的含义是backtrace, 列出调用栈.
    • gdb调试中常用的几个命令参数:
      • attachGDB调试一个正在运行中的进程gdb <program> PID;
      • br用来设置断点, br filename:line_num,br namespace::classname::func_name;
      • n:单步跳过, s:单步进行;
      • finish:执行到函数return返回的地方;
      • list:列出当前位置之后的10行代码;list line_number列出line_number之后的十行代码;
      • info locals列出当前函数的局部变量;
      • p var_打印变量值;
      • info breakpoints列出所有断点;
      • delete breakpoints删除所有断点;
      • delete breakpoints id删除编号为id的断点;
      • disable/enable breakpoints id禁用/启动断点;
      • break ... if ...条件中断;
  • 我的程序执行bt后发现有很多问号, 这是因为gdb没有加载我程序库的信息, 编译的时候需要加上-g选项:
(gdb) bt
#0  0x0000556393da88ad in ?? ()
#1  0x00000009b6f194c0 in ?? ()
#2  0x00005563b686d1b0 in ?? ()
#3  0x00005563b688abe0 in ?? ()
#4  0x00007ffe22b8c070 in ?? ()
#5  0x00005563b5f36460 in ?? ()
#6  0x0000000000002bf9 in ?? ()
#7  0x0000000000000004 in ?? ()
#8  0x00005563b718a580 in ?? ()
#9  0x0000000000000020 in ?? ()
#10 0x00007ffe22b8c510 in ?? ()
#11 0x00007ffe22b8bf50 in ?? ()
#12 0x00005563b6a2ffd0 in ?? ()
#13 0x00007ffe22b8bf50 in ?? ()
#14 0x0000000000000008 in ?? ()
#15 0x00005563b6a30004 in ?? ()
#16 0x00007ffe22b8c450 in ?? ()
#17 0x00007ffe22b8c590 in ?? ()
#18 0x0000556393dabd1e in ?? ()
#19 0x00007f1b3f2da1f0 in ?? ()
#20 0x0000556393dabcd2 in ?? ()
#21 0x00007ffe22b8c610 in ?? ()
#22 0x00007ffe22b8bf00 in ?? ()
#23 0x00007ffe22b8c550 in ?? ()
#24 0x00007ffe22f747d0 in ?? ()
#25 0x00007ffe22b8c220 in ?? ()
#26 0x00007ffe22b8c200 in ?? ()
#27 0x0000000000000032 in ?? ()
#28 0x00007ffe22b8c470 in ?? ()
#29 0x00007ffe22b8c530 in ?? ()
#30 0x00007ffe22b8c070 in ?? ()
#31 0x00007ffe22b8c4f0 in ?? ()
#32 0x00007ffe22b8c510 in ?? ()
#33 0x00000000000211e0 in ?? ()
#34 0x00007ffe22b8c5b0 in ?? ()
#35 0x0000000022b8c490 in ?? ()
#36 0x0000000000000198 in ?? ()
#37 0x00007ffe22b8c490 in ?? ()
#38 0x00007ffe22b8c630 in ?? ()
#39 0x00007ffe22b8befc in ?? ()
#40 0x00003d2400000005 in ?? ()
#41 0x0000000000000000 in ?? ()
  • gdb中执行symbol-file 共享动态库的路径进行加载gdb调试时的动态库搜索路径.
    • ldd命令可以列出一个二进制文件的依赖关系.
    • 利用set solib-search-path进行寻找依赖库.
#0  0x0000556393da88ad in ?? ()
#1  0x00000009b6f194c0 in ?? ()
#2  0x00005563b686d1b0 in ?? ()
#3  0x00005563b688abe0 in ?? ()
#4  0x00007ffe22b8c070 in ?? ()
#5  0x00005563b5f36460 in ?? ()
#6  0x0000000000002bf9 in ?? ()
#7  0x0000000000000004 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.44] ()
#8  0x00007ffe22b8c510 in ?? ()
#9  0x00007ffe22b8bf50 in ?? ()
#10 0x00005563b6a2ffd0 in ?? ()
#11 0x00007ffe22b8bf50 in ?? ()
#12 0x0000000000000008 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.44] ()
#13 0x0000556393dabd1e in ?? ()
#14 0x00007f1b3f2da1f0 in ?? ()
#15 0x0000556393dabcd2 in ?? ()
#16 0x00007ffe22b8c610 in ?? ()
#17 0x00007ffe22b8bf00 in ?? ()
#18 0x00007ffe22b8c550 in ?? ()
#19 0x00007ffe22f747d0 in ?? ()
#20 0x00007ffe22b8c220 in ?? ()
#21 0x00007ffe22b8c200 in ?? ()
#22 0x0000000000000032 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.44] ()
#23 0x00000000000211e0 in ?? ()
#24 0x00007ffe22b8c5b0 in ?? ()
#25 0x0000000022b8c490 in ?? ()
Backtrace stopped: Cannot access memory at address 0x195
  • 最后的结果gdb调试结果为:
[New LWP 5070]
Core was generated by `8, 6238, 6768, 6414, 5857, 6219, 6774, 5642, 5099, 6080)

(gdb) frame 0
#0  0x00007fa69aa8f17c in ___vsnprintf_chk (s=0x7ffcb1275ffa ", 5347"<error: Cannot access memory at address 0x7ffcb1276000>, maxlen=<optimized out>,
    flags=1, slen=<optimized out>, format=0x55e0aef3a657 ", %d", args=args@entry=0x7ffcb0e28c00) at vsnprintf_chk.c:66
66  in vsnprintf_chk.c
(gdb) frame 1
#1  0x00007fa69aa8f095 in ___snprintf_chk (s=<optimized out>, maxlen=<optimized out>, flags=<optimized out>, slen=<optimized out>,
    format=<optimized out>) at snprintf_chk.c:34
34  snprintf_chk.c: No such file or directory.
(gdb) frame 2
#2  0x000055e0aef2ee70 in writeResult(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&, std::unordered_map<int, int, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, int> > >&, char*, int) ()
(gdb) frame 3
#3  0x000055e0aef35e5f in scheduling(std::vector<Vehicle, std::allocator<Vehicle> >&, std::vector<Road, std::allocator<Road> >&, std::vector<Cross, std::allocator<Cross> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
(gdb) frame 4
#4  0x35202c3536303620 in ?? ()
  • 以上结果说明在writeResult函数中出现了段错误.
  • 利用thread apply all bt full查看每个线程在堆栈中的使用情况.
  • GDB过程中最重要的几个指令为:
0. gdb core-CodeCraft-2019.5070.jl.1554081713
1. set solib-absolute-prefix /
2. set solib-search-path /
3. file 可执行文件
4. core-file core-CodeCraft-2019.5070.jl.1554081713
5. frame 2
04-01 18:45