为什么 GCC 不将 a*a*a*a*a*a 优化为 (a*a*a)*(a*a*a)?

本文介绍了为什么 GCC 不将 a*a*a*a*a*a 优化为 (a*a*a)*(a*a*a)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对科学应用程序进行一些数值优化.我注意到的一件事是 GCC 将通过将其编译为 a*a 来优化调用 pow(a,2)，但是调用 pow(a,6) 没有优化，实际上会调用库函数pow，大大降低了性能.(相比之下，Intel C++ Compiler，可执行icc，将消除库调用对于 pow(a,6).)

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow, which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc, will eliminate the library call for pow(a,6).)

我很好奇的是，当我使用 GCC 4.5.1 将 pow(a,6) 替换为 a*a*a*a*a*a 时和选项-O3 -lm -funroll-loops -msse4"，它使用 5 个 mulsd 指令:

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13

如果我写(a*a*a)*(a*a*a)，它会产生

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm13, %xmm13

将乘法指令的数量减少到 3 个.icc 具有类似的行为.

which reduces the number of multiply instructions to 3. icc has similar behavior.

为什么编译器无法识别这种优化技巧?

Why do compilers not recognize this optimization trick?

the

为什么 GCC 不将 aaaaaa 优化为 (aaa)(aaa)?

问题描述

推荐答案