本文介绍了在glibc中禁用AVX优化的函数(LD_HWCAP_MASK,/etc/ld.so.nohwcap),用于valgrind&amp; gdb记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 具有glibc的现代x86_64 linux将检测到CPU支持AVX扩展,并将许多字符串函数从通用实现切换到 AVX优化的版本(在ifunc dispatchers的帮助下: 1 , 2 )。 这个功能对性能有好处,但它可以防止像valgrind这样的工具( valgrind-3.8 之前的消息/ 29184629 /rel =noreferrer>较旧的libVEX )和gdb的目标记录 gdb 7.12 .50.20170207-0ubuntu2,gcc 6.3.0-8ubuntu1 20170221,Ubuntu GLIBC 2.24-7ubuntu2): Reverse Execution ) / p> $ cat ac #include< string.h> #define N 1000 int main(){ char src [N],dst [N]; memcpy(dst,src,N); 返回0; } $ gcc ac -oa -fno-builtin $ gdb -q ./a 读取./a...中的符号(没有找到调试符号).. .done。 (gdb)start 临时断点1在0x724 启动程序:/ home / user / src / a 临时断点1,0x0000555555554724 in main b $ b(gdb)记录(gdb)c 继续。 过程记录不支持地址0x7ffff7b60d31处的指令0xc5。 过程记录:无法记录执行日志。 程序已停止。 __memmove_avx_unaligned_erms()at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:416 416 VMOVU(%rsi),%VEC(4)(gdb )x / i $ pc => 0x7ffff7b60d31< __ memmove_avx_unaligned_erms + 529> ;: vmovdqu(%rsi),%ymm4 有错误讯息由于AVX指令不被记录/重放引擎支持,所以 c>进程记录不支持gdb的目标记录实现中的指令0xc5 (有时在 _dl_runtime_resolve_avx 函数): https:/ /sourceware.org/ml/gdb/2016-08/msg00028.html 某些AVX指令不受流程记录支持, https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1573786 , https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=836802 , https://bugzilla.redhat .com / show_bug.cgi?id = 1136403 解决方案建议在 https://sourceware.org/ml/gdb/2016-08/msg00028.html 您可以重新编译libc(即ld.so)或hack __init_cpu_features,因此__cpu_features在运行时(请参阅例如strcmp)。或设置 LD_BIND_NOW = 1 ,但重新编译的glibc仍然有AVX,并且ld bind-现在不起作用。 听说glibc中有 /etc/ld.so.nohwcap 和 LD_HWCAP_MASK 配置。他们可以用来禁用ifunc调度到glibc中AVX优化的字符串函数吗? glibc(rtld?)如何检测AVX,使用 cpuid ,用 / proc / cpuinfo (可能不是)或HWCAP aux ( LD_SHOW_AUXV = 1 / bin / echo | grep HWCAP 命令给出 AT_HWCAP:bfebfbff 二进制补丁链接器似乎是目前最简单的方法。@osgx 描述了一个方法跳转被覆盖。另一种方法是伪造cpuid结果。正常情况下, cpuid(eax = 0)会返回 eax 中支持的最高的函数,而制造商ID是返回在寄存器ebx,ecx和edx中。我们在glibc 2.25 sysdeps / x86 / cpu-features中有这样的代码片段:: __ cpuid(0,cpu_features-> max_cpuid,ebx,ecx,edx); / *这表明GenuineIntel。 * / if(ebx == 0x756e6547&& ecx == 0x6c65746e&& edx == 0x49656e69) { / *各种Intel CPU的功能检测* / } / * AMD的另一种情况* / else { kind = arch_kind_other; get_common_indeces(cpu_features,NULL,NULL,NULL,NULL); __ cpuid 行转换为 /lib/ld-linux-x86-64.so.2 ( /lib/ld-2.25.so 172a8:31 c0 xor eax,eax 172aa:c7 44 24 38 00 00 00 mov DWORD PTR [rsp + 0x38],0x0 172b1:00 172b2:c7 44 24 3c 00 00 00 mov DWORD PTR [rsp + 0x3c],0x0 172b9:00 172ba:0f a2 cpuid 因此,除了修补分支外,我们还可以更改 cpuid 转换为 nop 指令,这将导致调用最后一个 else 分支(因为寄存器不包含GenuineIntel)。由于最初 eax = 0 ,因此 cpu_features-> max_cpuid 也将为0,并且 if (cpu_features-> max_cpuid> = 7)也会被绕过。 $ b 二进制补丁 cpuid(eax = 0)通过 nop 这可以使用这个工具完成(适用于x86和x86-64): #!/ usr / bin / env python import re import sys infile,outfile = sys.argv [1:] d = open(infile,'rb')。read()#匹配CPUID(eax = 0),xor eax,eax紧密地通过cpuido = re.sub(b'(\ x31\xc0。{0,32})\x0f\xa2',b'\\\\\\\\\\' x90',d) assert d!= o open(outfile,'wb')。write(o) 这是简单的部分。现在,我不想替换系统范围的动态链接器,但只使用此链接器执行一个特定的程序。当然,这可以用 ./ ld-linux-x86-64-patched.so.2 ./a 来完成,但是天真的gdb调用未能设置断点: $ gdb -q -exset exec-wrapper ./ld-linux-x86-64-patched.so.2 -ex start ./a 从./a...done中读取符号。 0x400502处的临时断点1:文件a.c,第5行。启动程序:/ tmp / a 启动时程序正常退出。 (gdb)quit $ gdb -q -ex start --args ./ld-linux-x86-64-patched.so.2 ./a 读取./ld中的符号-linux-x86-64-patched.so.2 ...(没有找到调试符号)...完成。 未定义函数main。 临时断点1(主)正在等待处理。 启动程序:/tmp/ld-linux-x86-64-patched.so.2 ./a [劣1(进程27418)正常退出] (gdb)quit 手动解决方法在如何使用自定义elf解释器来调试程序?它可以工作,但不幸的是,它是一个使用 add-symbol-file 。尽管如此,应该可以使用 GDB Catchpoints 将其自动化一下。 另一种不使用二进制链接的方法是 LD_PRELOAD 一个为 memcpy , memove 等。然后这将优先于glibc例程。完整的函数列表可以在 sysdeps / x86_64 / multiarch / ifunc-impl-list.c 。总共( grep -Po'IFUNC_IMPL \(i,name,\ K [^,] +'sysdeps / x86_64 / multiarch / ifunc- impl-list.c ): lockquote memchr $ b $ memcmp $ b $ __memmove_chk, memmove, memrchr, __memset_chk, memset, rawmemchr, strlen, strnlen, stpncpy, stpcpy, strcasecmp, strcasecmp_l, strcat, strchr, strchrnul, strrchr, strcmp, strcpy, strcspn, strncasecmp, strncasecmp_l, strncat, strncpy, strpbrk, strspn, strstr, wcschr, wcsrchr, wcscpy, wcslen, wcsnlen, wmemchr, wmemcmp, wmemset, __memcpy_chk, memcpy, __mempcpy_chk, mempcpy, strncmp, __wmemset_chk, Modern x86_64 linux with glibc will detect that CPU has support of AVX extension and will switch many string functions from generic implementation to AVX-optimized version (with help of ifunc dispatchers: 1, 2).This feature can be good for performance, but it prevents several tool like valgrind (older libVEXs, before valgrind-3.8) and gdb's "target record" (Reverse Execution) from working correctly (Ubuntu "Z" 17.04 beta, gdb 7.12.50.20170207-0ubuntu2, gcc 6.3.0-8ubuntu1 20170221, Ubuntu GLIBC 2.24-7ubuntu2):$ cat a.c#include <string.h>#define N 1000int main(){ char src[N], dst[N]; memcpy(dst, src, N); return 0;}$ gcc a.c -o a -fno-builtin$ gdb -q ./aReading symbols from ./a...(no debugging symbols found)...done.(gdb) startTemporary breakpoint 1 at 0x724Starting program: /home/user/src/aTemporary breakpoint 1, 0x0000555555554724 in main ()(gdb) record(gdb) cContinuing.Process record does not support instruction 0xc5 at address 0x7ffff7b60d31.Process record: failed to record execution log.Program stopped.__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:416416 VMOVU (%rsi), %VEC(4)(gdb) x/i $pc=> 0x7ffff7b60d31 <__memmove_avx_unaligned_erms+529>: vmovdqu (%rsi),%ymm4There is error message "Process record does not support instruction 0xc5" from gdb's implementation of "target record", because AVX instructions are not supported by the record/replay engine (sometimes the problem is detected on _dl_runtime_resolve_avx function): https://sourceware.org/ml/gdb/2016-08/msg00028.html "some AVX instructions are not supported by process record", https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1573786, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=836802, https://bugzilla.redhat.com/show_bug.cgi?id=1136403Solution proposed in https://sourceware.org/ml/gdb/2016-08/msg00028.html "You can recompile libc (thus ld.so), or hack __init_cpu_features and thus __cpu_features at runtime (see e.g. strcmp)." or set LD_BIND_NOW=1, but recompiled glibc still has AVX, and ld bind-now doesn't help.I heard that there are /etc/ld.so.nohwcap and LD_HWCAP_MASK configurations in glibc. Can they be used to disable ifunc dispatching to AVX-optimized string functions in glibc?How does glibc (rtld?) detects AVX, using cpuid, with /proc/cpuinfo (probably not), or HWCAP aux (LD_SHOW_AUXV=1 /bin/echo |grep HWCAP command gives AT_HWCAP: bfebfbff)? 解决方案 There does not seem a straightforward runtime method to patch feature detection. This detection happens rather early in the dynamic linker (ld.so).Binary patching the linker seems the easiest method at the moment. @osgx described one method where a jump is overwritten. Another approach is just to fake the cpuid result. Normally cpuid(eax=0) returns the highest supported function in eax while the manufacturer IDs are returned in registers ebx, ecx and edx. We have this snippet in glibc 2.25 sysdeps/x86/cpu-features.c:__cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx);/* This spells out "GenuineIntel". */if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69) { /* feature detection for various Intel CPUs */ }/* another case for AMD */else { kind = arch_kind_other; get_common_indeces (cpu_features, NULL, NULL, NULL, NULL); }The __cpuid line translates to these instructions in /lib/ld-linux-x86-64.so.2 (/lib/ld-2.25.so):172a8: 31 c0 xor eax,eax172aa: c7 44 24 38 00 00 00 mov DWORD PTR [rsp+0x38],0x0172b1: 00172b2: c7 44 24 3c 00 00 00 mov DWORD PTR [rsp+0x3c],0x0172b9: 00172ba: 0f a2 cpuidSo rather than patching branches, we could as well change the cpuid into a nop instruction which would result in invocation of the last else branch (as the registers will not contain "GenuineIntel"). Since initially eax=0, cpu_features->max_cpuid will also be 0 and the if (cpu_features->max_cpuid >= 7) will also be bypassed.Binary patching cpuid(eax=0) by nop this can be done with this utility (works for both x86 and x86-64):#!/usr/bin/env pythonimport reimport sysinfile, outfile = sys.argv[1:]d = open(infile, 'rb').read()# Match CPUID(eax=0), "xor eax,eax" followed closely by "cpuid"o = re.sub(b'(\x31\xc0.{0,32})\x0f\xa2', b'\\1\x66\x90', d)assert d != oopen(outfile, 'wb').write(o)That was the easy part. Now, I did not want to replace the system-wide dynamic linker, but execute only one particular program with this linker. Sure, that can be done with ./ld-linux-x86-64-patched.so.2 ./a, but the naive gdb invocations failed to set breakpoints:$ gdb -q -ex "set exec-wrapper ./ld-linux-x86-64-patched.so.2" -ex start ./aReading symbols from ./a...done.Temporary breakpoint 1 at 0x400502: file a.c, line 5.Starting program: /tmp/aDuring startup program exited normally.(gdb) quit$ gdb -q -ex start --args ./ld-linux-x86-64-patched.so.2 ./aReading symbols from ./ld-linux-x86-64-patched.so.2...(no debugging symbols found)...done.Function "main" not defined.Temporary breakpoint 1 (main) pending.Starting program: /tmp/ld-linux-x86-64-patched.so.2 ./a[Inferior 1 (process 27418) exited normally](gdb) quitA manual workaround is described in How to debug program with custom elf interpreter? It works, but it is unfortunately a manual action using add-symbol-file. It should be possible to automate it a bit using GDB Catchpoints though.An alternative approach that does not binary linking is LD_PRELOADing a library that defines custom routines for memcpy, memove, etc. This will then take precedence over the glibc routines. The full list of functions is available in sysdeps/x86_64/multiarch/ifunc-impl-list.c. Current HEAD has more symbols compared to the glibc 2.25 release, in total (grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c): memchr, memcmp, __memmove_chk, memmove, memrchr, __memset_chk, memset, rawmemchr, strlen, strnlen, stpncpy, stpcpy, strcasecmp, strcasecmp_l, strcat, strchr, strchrnul, strrchr, strcmp, strcpy, strcspn, strncasecmp, strncasecmp_l, strncat, strncpy, strpbrk, strspn, strstr, wcschr, wcsrchr, wcscpy, wcslen, wcsnlen, wmemchr, wmemcmp, wmemset, __memcpy_chk, memcpy, __mempcpy_chk, mempcpy, strncmp, __wmemset_chk, 这篇关于在glibc中禁用AVX优化的函数(LD_HWCAP_MASK,/etc/ld.so.nohwcap),用于valgrind&amp; gdb记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-29 14:35