本文介绍了为什么 GCC 不将 a*a*a*a*a*a 优化为 (a*a*a)*(a*a*a)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对科学应用程序进行一些数值优化.我注意到的一件事是 GCC 将通过将其编译为 a*a 来优化调用 pow(a,2),但是调用 pow(a,6) 没有优化,实际上会调用库函数pow,大大降低了性能.(相比之下,Intel C++ Compiler,可执行icc,将消除库调用对于 pow(a,6).)

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow, which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc, will eliminate the library call for pow(a,6).)

我很好奇的是,当我使用 GCC 4.5.1 将 pow(a,6) 替换为 a*a*a*a*a*a 时和选项-O3 -lm -funroll-loops -msse4",它使用 5 个 mulsd 指令:

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13

如果我写(a*a*a)*(a*a*a),它会产生

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm13, %xmm13

将乘法指令的数量减少到 3 个.icc 具有类似的行为.

which reduces the number of multiply instructions to 3. icc has similar behavior.

为什么编译器无法识别这种优化技巧?

Why do compilers not recognize this optimization trick?

推荐答案

因为 浮点数学不是联想.在浮点乘法中对操作数进行分组的方式会影响答案的数值准确性.

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.

因此,大多数编译器对重新排序浮点计算非常保守,除非他们可以确定答案将保持不变,或者除非您告诉他们您不关心数值精度.例如:-fassociative-math 选项 gcc 允许 gcc 重新关联浮点运算,甚至是 -ffast-math 选项,它允许在精度与速度之间进行更积极的权衡.

As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -fassociative-math option of gcc which allows gcc to reassociate floating point operations, or even the -ffast-math option which allows even more aggressive tradeoffs of accuracy against speed.

这篇关于为什么 GCC 不将 a*a*a*a*a*a 优化为 (a*a*a)*(a*a*a)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-29 09:00
查看更多