我有一个C++程序,该程序调用许多trig函数。它已经运行了一年多。我最近安装了gcc-4.8,并且更新了glibc。这导致我的程序速度降低了将近1000倍。 Using gdb我发现减速的原因是对std::tan()的调用。当参数为pi或pi/2时,函数需要很长时间才能返回。

这是一个MWE,如果在不进行优化的情况下进行编译,它就会重现该问题(实际程序在带有-O2标志和不带有-O2标志的情况下都存在相同的问题)。

#include <cmath>

int main() {
    double pi = 3.141592653589793;
    double approxPi = 3.14159;
    double ret = 0.;

    for(int i = 0; i < 100000; ++i) ret = std::tan(pi); //Very slow
    for(int i = 0; i < 100000; ++i) ret = std::tan(approxPi); //Not slow
}

这是gdb的示例回溯(使用Ctrl + c随机中断程序后获得)。从调用tan开始,回溯在MWE和我的真实程序中是相同的。
#0  0x00007ffff7b1d048 in __mul (p=32, z=0x7fffffffc740, y=0x7fffffffcb30, x=0x7fffffffc890) at ../sysdeps/ieee754/dbl-64/mpa.c:458
#1  __mul (x=0x7fffffffc890, y=0x7fffffffcb30, z=0x7fffffffc740, p=32) at ../sysdeps/ieee754/dbl-64/mpa.c:443
#2  0x00007ffff7b1e348 in cc32 (p=32, y=0x7fffffffc4a0, x=0x7fffffffbf60) at ../sysdeps/ieee754/dbl-64/sincos32.c:111
#3  __c32 (x=<optimized out>, y=0x7fffffffcf50, z=0x7fffffffd0a0, p=32) at ../sysdeps/ieee754/dbl-64/sincos32.c:128
#4  0x00007ffff7b1e170 in __mptan (x=<optimized out>, mpy=0x7fffffffd690, p=32) at ../sysdeps/ieee754/dbl-64/mptan.c:57
#5  0x00007ffff7b45b46 in tanMp (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/s_tan.c:503
#6  __tan_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/s_tan.c:488
#7  0x00000000004005b8 in main ()

我尝试在四个不同的系统上运行代码(MWE和实际程序)。其中两个位于我运行代码的集群中。两个是我的笔记本电脑。 MWE在群集之一和一台笔记本电脑上运行都没有问题。我检查了每个系统使用的libm.so.6版本,以防相关。以下列表显示了系统描述(取自cat /etc/*-release),CPU是32位还是64位,MWE速度是否缓慢以及最后运行/lib/libc.so.6cat /proc/cpuinfo的输出。
  • SUSE Linux Enterprise Server 11(x86_64),64位,使用libm-2.11.1.so(MWE很快)

  • GNU C Library stable release version 2.11.1 (20100118), by Roland McGrath et al.
    Copyright (C) 2009 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.
    There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
    PARTICULAR PURPOSE.
    Configured for x86_64-suse-linux.
    Compiled by GNU CC version 4.3.4 [gcc-4_3-branch revision 152973].
    Compiled on a Linux 2.6.32 system on 2012-04-12.
    Available extensions:
            crypt add-on version 2.1 by Michael Glad and others
            GNU Libidn by Simon Josefsson
            Native POSIX Threads Library by Ulrich Drepper et al
            BIND-8.2.3-T5B
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/libc/bugs.html>.
    
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 63
    model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
    stepping        : 2
    microcode       : 53
    cpu MHz         : 1200.000
    cache size      : 30720 KB
    physical id     : 0
    siblings        : 24
    core id         : 0
    cpu cores       : 12
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 15
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid
    bogomips        : 5000.05
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
    
  • CentOS 6.7版(最终版),使用libm-2.12.so(MWE较慢),64位,

  • GNU C Library stable release version 2.12, by Roland McGrath et al.
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.
    There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
    PARTICULAR PURPOSE.
    Compiled by GNU CC version 4.4.7 20120313 (Red Hat 4.4.7-16).
    Compiled on a Linux 2.6.32 system on 2015-09-22.
    Available extensions:
            The C stubs add-on version 2.1.2.
            crypt add-on version 2.1 by Michael Glad and others
            GNU Libidn by Simon Josefsson
            Native POSIX Threads Library by Ulrich Drepper et al
            BIND-8.2.3-T5B
            RT using linux kernel aio
    libc ABIs: UNIQUE IFUNC
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/libc/bugs.html>.
    
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 26
    model name      : Intel(R) Xeon(R) CPU           E5507  @ 2.27GHz
    stepping        : 5
    cpu MHz         : 1596.000
    cache size      : 4096 KB
    physical id     : 0
    siblings        : 4
    core id         : 0
    cpu cores       : 4
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 11
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
    bogomips        : 4533.16
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 40 bits physical, 48 bits virtual
    power management:
    
  • Ubuntu精确(12.04.5 LTS),64位,使用libm-2.15.so(我的第一台笔记本电脑,MWE较慢)

  • GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10.15) stable release version 2.15, by Roland McGrath et al.
    Copyright (C) 2012 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.
    There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
    PARTICULAR PURPOSE.
    Compiled by GNU CC version 4.6.3.
    Compiled on a Linux 3.2.79 system on 2016-05-26.
    Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
    libc ABIs: UNIQUE IFUNC
    For bug reporting instructions, please see:
    <http://www.debian.org/Bugs/>.
    
    processor   : 0
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 42
    model name  : Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
    stepping    : 7
    microcode   : 0x1a
    cpu MHz     : 800.000
    cache size  : 4096 KB
    physical id : 0
    siblings    : 4
    core id     : 0
    cpu cores   : 2
    apicid      : 0
    initial apicid  : 0
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 13
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
    bogomips    : 5387.59
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    
  • 使用libm-2.15.so(我的第二台笔记本电脑,MWE很快)的Ubuntu Precision(12.04.5 LTS),32位。

    GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10.12) stable release version 2.15, by Roland McGrath et al.
    Copyright (C) 2012 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.
    There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
    PARTICULAR PURPOSE.
    Compiled by GNU CC version 4.6.3.
    Compiled on a Linux 3.2.68 system on 2015-03-26.
    Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
    libc ABIs: UNIQUE IFUNC
    For bug reporting instructions, please see:
    <http://www.debian.org/Bugs/>.
    
    processor    : 0
    vendor_id    : GenuineIntel
    cpu family    : 6
    model        : 15
    model name    : Intel(R) Core(TM)2 Duo CPU     T5800  @ 2.00GHz
    stepping    : 13
    microcode    : 0xa3
    cpu MHz        : 800.000
    cache size    : 2048 KB
    physical id    : 0
    siblings    : 2
    core id        : 0
    cpu cores    : 2
    apicid        : 0
    initial apicid    : 0
    fdiv_bug    : no
    hlt_bug        : no
    f00f_bug    : no
    coma_bug    : no
    fpu        : yes
    fpu_exception    : yes
    cpuid level    : 10
    wp        : yes
    flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm
    bogomips    : 3989.79
    clflush size    : 64
    cache_alignment    : 64
    address sizes    : 36 bits physical, 48 bits virtual
    power management:
    

    希望我能提供足够的背景信息。这些是我的问题。
  • 为什么std::tan()变慢?
  • 是否有办法将其恢复到正常速度?

  • 我非常希望不需要安装/替换一堆库的解决方案。那可能可以在我的笔记本电脑上使用,但是我对群集节点没有必要的权限。

    更新#1:
    正如Sam Varshavchik所解释的,我删除了有关将常数传递给tan的观察。我将运行/lib/libc.so.6的输出添加到了系统列表中。还增加了第四个系统。至于计时,这是通过time ./mwe循环运行pi的输出(approxPi已注释掉)。
    real    0m11.483s
    user    0m11.465s
    sys 0m0.004s
    

    这是approxPi循环(已将pi注释掉)。
    real    0m0.011s
    user    0m0.008s
    sys 0m0.000s
    

    更新#2:
    对于每个系统,添加了CPU是32位还是64位以及第一个内核的cat /proc/cpuinfo的输出。

    最佳答案

    超越函数(诸如三角函数和指数函数之类)的准确性始终存在问题。

    为什么某些触发函数调用的速度比其他函数慢

    对于三角函数的许多自变量,有一个快速近似值可以为大多数自变量产生高度准确的结果。但是,对于某些论点,这种近似可能是完全错误的。因此,需要采用更精确的方法,但是这些方法会花费更长的时间(如您所注意到的)。

    为什么新图书馆现在会变慢

    长期以来,英特尔一直对其三角函数的 float 版本的准确性产生误导性声称,称它们比真正的更准确。如此多,以至于glibc过去只是将sin(double)作为fsin(float) 的包装器。您可能已升级到纠正了此错误的glibc版本。我不能说AMD的libm,但是它仍然可能依赖于三角函数45的浮点型的正确性。

    该怎么办

    如果您想提高速度,而又对精度不太感兴趣,请使用tan(ftan)的 float 版本。否则,如果您需要准确性,则将使用较慢的方法。最好的办法是缓存tan(pitan(pi/2)的结果,并在认为可能需要时使用预先计算的值。

    07-28 04:38