问题描述
所以我一直在看看一些神奇的 O3
在GCC(实际上我正在使用Clang编译,但它是相同的GCC和我'
So I have been having a look at some of the magic that is O3
in GCC (well actually I'm compiling using Clang but it's the same with GCC and I'm guessing a large part of the optimiser was pulled over from GCC to Clang).
考虑这个C程序:
int foo(int n) {
if (n == 0) return 1;
return n * foo(n-1);
}
int main() {
return foo(10);
}
我第一件事就是WOW- ed at at this question - )是如何 int foo(int)
(一个基本的阶乘函数)编译成一个严格的循环。这是它的ARM程序集:
The first thing I was pretty WOW-ed at (which was also WOW-ed at in this question - http://stackoverflow.com/a/414774/1068248) was how int foo(int)
(a basic factorial function) compiles into a tight loop. This is the ARM assembly for it:
.globl _foo
.align 2
.code 16
.thumb_func _foo
_foo:
mov r1, r0
movs r0, #1
cbz r1, LBB0_2
LBB0_1:
muls r0, r1, r0
subs r1, #1
bne LBB0_1
LBB0_2:
bx lr
Blimey我想。这很有趣!完全紧密循环做因子。哇。这不是尾调用优化,因为,好吧,它不是尾调用。
Blimey I thought. That's pretty interesting! Completely tight looping to do the factorial. WOW. It's not a tail call optimisation since, well, it's not a tail call. But it appears to have done a much similar optimisation.
现在看看 main
:
.globl _main
.align 2
.code 16
.thumb_func _main
_main:
movw r0, #24320
movt r0, #55
bx lr
这只是让我的头脑诚实。它只是完全绕过 foo
并返回 3628800
这是 10!
。
That just blew my mind to be honest. It's just totally bypassing foo
and returning 3628800
which is 10!
.
这使我真正意识到你的编译器通常能做的比你在优化你的代码更好的工作。但它提出了一个问题,它是如何做得这么好的工作?因此,任何人都可以解释(可能通过链接到相关代码)以下优化的工作原理:
This makes me really realise how your compiler can often do a much better job than you can at optimising your code. But it raises the question, how does it manage to do such a good job? So, can anyone explain (possibly by linking to relevant code) how the following optimisations work:
The initial
foo
optimisation to be a tight loop.
优化 main
直接返回结果,而不是实际执行 foo
。
The optimisation where main
just goes and returns the result directly rather than actually executing foo
.
这个问题的另一个有趣的副作用是显示一些更有趣的优化,GCC / Clang可以做。
Also another interesting side effect of this question would be to show some more interesting optimisations which GCC/Clang can do.
推荐答案
如果使用 gcc -O3 -fdump-tree-all
进行编译,您可以看到递归已变成循环的第一个转储是 foo.c.035t.tailr1
。这意味着处理其他尾调用的同样的优化也会处理这种稍微扩展的情况。以 n * foo(...)
或 n + foo(...)
的形式递归很难手动处理(见下文),因为它可以描述如何,编译器可以自动执行优化。
If you compile with gcc -O3 -fdump-tree-all
, you can see that the first dump in which the recursion has been turned into a loop is foo.c.035t.tailr1
. This means the same optimisation that handles other tail calls also handles this slightly extended case. Recursion in the form of n * foo(...)
or n + foo(...)
is not that hard to handle manually (see below), and since it's possible to describe exactly how, the compiler can perform that optimisation automatically.
优化 main
更简单:inlining可以将其转换为 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 * 1
,如果乘法的所有操作数都是常量,则可以在编译时执行乘法。
The optimisation of main
is much simpler: inlining can turn this into 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 * 1
, and if all the operands of a multiplication are constants, then the multiplication can be performed at compile time.
更新:您可以手动从 foo
中删除递归,这可以自动完成。我不是说这是GCC使用的方法,但它是一个现实的可能性。
Update: Here's how you can manually remove the recursion from foo
, which can be done automatically. I'm not saying this is the method used by GCC, but it's one realistic possibility.
首先,创建一个帮助函数。它的行为完全符合 foo(n)
,除了它的结果乘以一个额外的参数 f
。
First, create a helper function. It behaves exactly as foo(n)
, except that its results are multiplied by an extra parameter f
.
int foo(int n)
{
return foo_helper(n, 1);
}
int foo_helper(int n, int f)
{
if (n == 0) return f * 1;
return f * n * foo(n-1);
}
然后,打开 foo
到 foo_helper
的递归调用,并依赖于factor参数来摆脱乘法。
Then, turn recursive calls of foo
into recursive calls of foo_helper
, and rely on the factor parameter to get rid of the multiplication.
int foo(int n)
{
return foo_helper(n, 1);
}
int foo_helper(int n, int f)
{
if (n == 0) return f;
return foo_helper(n-1, f * n);
}
将此转成循环:
int foo(int n)
{
return foo_helper(n, 1);
}
int foo_helper(int n, int f)
{
restart:
if (n == 0) return f;
{
int newn = n-1;
int newf = f * n;
n = newn;
f = newf;
goto restart;
}
}
最后,inline foo_helper
:
Finally, inline foo_helper
:
int foo(int n)
{
int f = 1;
restart:
if (n == 0) return f;
{
int newn = n-1;
int newf = f * n;
n = newn;
f = newf;
goto restart;
}
}
(当然,这不是最明智的方法手动写入函数。)
(Naturally, this is not the most sensible way to manually write the function.)
这篇关于编译器如何优化这个阶乘函数呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!