我正在尝试在Intel Xeon E5-2620v4的Debian 8.5上安装扭矩6.0.2。但是,当我尝试启动pbs_server时,使用gdb返回了段错误:

#1  0x0000000000440ab6 in container::item_container<pbsnode*>::unlock (this=0xb5d900 <allnodes>) at ../../src/include/container.hpp:537
#2  0x00000000004b787f in mom_hierarchy_handler::nextNode (this=0x4e610c0 <hierarchy_handler>, iter=0x7fffffff98b8) at mom_hierarchy_handler.cpp:122
#3  0x00000000004b7a7d in mom_hierarchy_handler::make_default_hierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:149
#4  0x00000000004b898d in mom_hierarchy_handler::loadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:433
#5  0x00000000004b8ae8 in mom_hierarchy_handler::initialLoadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:472
#6  0x000000000045262a in pbsd_init (type=1) at pbsd_init.c:2299
#7  0x00000000004591ff in main (argc=2, argv=0x7fffffffdec8) at pbsd_main.c:1883


dmesg:

traps: pbs_server[22249] general protection ip:7f9c08a7a2c8 sp:7ffe520b5238 error:0 in libpthread-2.19.so[7f9c08a69000+18000]


valgrind:

==22381== Memcheck, a memory error detector
==22381== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==22381== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==22381== Command: pbs_server
==22381==
==22381==
==22381== HEAP SUMMARY:
==22381==     in use at exit: 18,051 bytes in 53 blocks
==22381==   total heap usage: 169 allocs, 116 frees, 42,410 bytes allocated
==22381==
==22382==
==22382== HEAP SUMMARY:
==22382==     in use at exit: 19,755 bytes in 56 blocks
==22382==   total heap usage: 172 allocs, 116 frees, 44,114 bytes allocated
==22382==
==22381== LEAK SUMMARY:
==22381==    definitely lost: 0 bytes in 0 blocks
==22381==    indirectly lost: 0 bytes in 0 blocks
==22381==      possibly lost: 0 bytes in 0 blocks
==22381==    still reachable: 18,051 bytes in 53 blocks
==22381==         suppressed: 0 bytes in 0 blocks
==22381== Rerun with --leak-check=full to see details of leaked memory
==22381==
==22381== For counts of detected and suppressed errors, rerun with: -v
==22381== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==22383==
==22383== Process terminating with default action of signal 11 (SIGSEGV)
==22383==  General Protection Fault
==22383==    at 0x72192CB: __lll_unlock_elision (elision-unlock.c:33)
==22383==    by 0x4E7E1A: unlock_node(pbsnode*, char const*, char const*, int) (u_lock_ctl.c:268)
==22383==    by 0x4B7A66: mom_hierarchy_handler::make_default_hierarchy() (mom_hierarchy_handler.cpp:164)
==22383==    by 0x4B898C: mom_hierarchy_handler::loadHierarchy() (mom_hierarchy_handler.cpp:433)
==22383==    by 0x4B8AE7: mom_hierarchy_handler::initialLoadHierarchy() (mom_hierarchy_handler.cpp:472)
==22383==    by 0x452629: pbsd_init(int) (pbsd_init.c:2299)
==22383==    by 0x4591FE: main (pbsd_main.c:1883)
==22382== LEAK SUMMARY:
==22382==    definitely lost: 0 bytes in 0 blocks
==22382==    indirectly lost: 0 bytes in 0 blocks
==22382==      possibly lost: 0 bytes in 0 blocks
==22382==    still reachable: 19,755 bytes in 56 blocks
==22382==         suppressed: 0 bytes in 0 blocks
==22382== Rerun with --leak-check=full to see details of leaked memory
==22382==
==22382== For counts of detected and suppressed errors, rerun with: -v
==22382== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==22383==
==22383== HEAP SUMMARY:
==22383==     in use at exit: 325,348 bytes in 186 blocks
==22383==   total heap usage: 297 allocs, 111 frees, 442,971 bytes allocated
==22383==
==22383== LEAK SUMMARY:
==22383==    definitely lost: 134 bytes in 6 blocks
==22383==    indirectly lost: 28 bytes in 3 blocks
==22383==      possibly lost: 524 bytes in 17 blocks
==22383==    still reachable: 324,662 bytes in 160 blocks
==22383==         suppressed: 0 bytes in 0 blocks
==22383== Rerun with --leak-check=full to see details of leaked memory
==22383==
==22383== For counts of detected and suppressed errors, rerun with: -v
==22383== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
~


没有其他软件具有此行为,我在满负荷的情况下经过2天的测试,没有出现问题。已经尝试更新处理器微代码。请问有人在扭矩6.0.2或其他情况下有这种行为吗?

最好的祝福。

最佳答案

这不是微码错误。无论您运行的是什么软件(而不是glibc / libpthreads),这都是一个完全的锁平衡问题。

不要尝试解锁已经解锁的锁。这是被禁止的行为,也是造成陷阱的原因。

出于性能方面的考虑,glibc无需费心测试它和段错误,因此很长一段时间以来,很多损坏的代码都没有得到使用。锁定省略(OTOH)的硬件实现确实会引起陷阱(Intel TSX,IBM Power 8,S390 / X ...),因此这种破损将很快出现在任何地方。

关于c++ - pbs_server,E5-2620v4和常规保护,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38905951/

10-13 03:16