为什么icc无法以合理的方式处理编译时分支提示?

本文介绍了为什么icc无法以合理的方式处理编译时分支提示?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

开发人员可以使用__builtin_expect 内置来帮助编译器可能朝哪个方向前进.

将来，出于此目的，我们可能会获得标准属性，但到目前为止，至少clang，icc和gcc改为支持非标准的__builtin_expect.

但是，当icc使用时，它似乎会生成奇怪的代码.也就是说，无论进行哪个方向的预测，使用内置代码的代码都比没有内置代码的代码严格.

以以下玩具功能为例:

int foo(int a, int b)
{
  do {
     a *= 77;
  } while (b-- > 0);
  return a * 77;
}

在这三个编译器中，icc是唯一将其编译为最优标量循环的编译器 a> 3条指令:

foo(int, int):
..B1.2:                         # Preds ..B1.2 ..B1.1
        imul      edi, edi, 77                                  #4.6
        dec       esi                                           #5.12
        jns       ..B1.2        # Prob 82%                      #5.18
        imul      eax, edi, 77                                  #6.14
        ret

gcc 和 C语通过简单的解决方案管理未命中并使用5条指令.

另一方面，当您在循环条件下使用likely或unlikely宏时，icc完全陷入脑瘫:

#define likely(x)   __builtin_expect((x), 1)
#define unlikely(x) __builtin_expect((x), 0)

int foo(int a, int b)
{

   do {
     a *= 77;
  } while (likely(b-- > 0));

   return a * 77;
}

此循环在功能上等同于上一个循环(因为__builtin_expect仅返回其第一个参数)，但 icc产生一些可怕的代码:

foo(int, int):
        mov       eax, 1                                        #9.12
..B1.2:                         # Preds ..B1.2 ..B1.1
        xor       edx, edx                                      #9.12
        test      esi, esi                                      #9.12
        cmovg     edx, eax                                      #9.12
        dec       esi                                           #9.12
        imul      edi, edi, 77                                  #8.6
        test      edx, edx                                      #9.12
        jne       ..B1.2        # Prob 95%                      #9.12
        imul      eax, edi, 77                                  #11.15
        ret                                                     #11.15

该函数的大小增加了一倍，达到10条指令，(更糟糕的是！)临界循环已增加了一倍以上，达到了7条指令，其中一条较长的关键依赖链涉及cmov和其他怪异的东西.

如果您在所有icc版本中都使用 unlikely提示，情况也是如此. 13、14、17).因此，无论提示如何，以及实际运行时的行为如何，代码生成都会变得更加糟糕.

使用提示时，gcc和clang都不会遭受任何降级.

这是怎么回事?

至少在我尝试的第一个示例和后续示例中.

解决方案

对我来说，这似乎是一个ICC错误.这段代码(在godbolt上可用)

int c;

do
{
    a *= 77;
    c = b--;
}
while (likely(c > 0));

仅使用辅助局部变量c的

会产生没有edx = !!(esi > 0)模式的输出

foo(int, int):
  ..B1.2:
    mov       eax, esi
    dec       esi
    imul      edi, edi, 77
    test      eax, eax
    jg        ..B1.2

但是，

仍然不是最佳选择(没有eax也可以实现).

我不知道官方 ICC关于__builtin_expect的策略是完全支持还是仅兼容性支持.

这个问题似乎更适合 ICC官方论坛.
我已经尝试过在此发布主题但我不确定自己做得如何(我被SO宠坏了).
如果他们回答我，我将更新此答案.

编辑
我在英特尔论坛上有一个答案，他们在跟踪系统中记录了此问题.
和今天一样，这似乎是一个错误.

A developer can use the __builtin_expect builtin to help the compiler understand in which direction a branch is likely to go.

In the future, we may get a standard attribute for this purpose, but as of today at least all of clang, icc and gcc support the non-standard __builtin_expect instead.

However, icc seems to generate oddly terrible code when you use it. That is, code that is uses the builtin is strictly worse than the code without it, regardless of which direction the prediction is made.

Take for example the following toy function:

int foo(int a, int b)
{
  do {
     a *= 77;
  } while (b-- > 0);
  return a * 77;
}

Out of the three compilers, icc is the only one that compiles this to the optimal scalar loop of 3 instructions:

foo(int, int):
..B1.2:                         # Preds ..B1.2 ..B1.1
        imul      edi, edi, 77                                  #4.6
        dec       esi                                           #5.12
        jns       ..B1.2        # Prob 82%                      #5.18
        imul      eax, edi, 77                                  #6.14
        ret

Both gcc and Clang manage the miss the easy solution and use 5 instructions.

On the other hand, when you use likely or unlikely macros on the loop condition, icc goes totally braindead:

#define likely(x)   __builtin_expect((x), 1)
#define unlikely(x) __builtin_expect((x), 0)

int foo(int a, int b)
{

   do {
     a *= 77;
  } while (likely(b-- > 0));

   return a * 77;
}

This loop is functionally equivalent to the previous loop (since __builtin_expect just returns its first argument), yet icc produces some awful code:

foo(int, int):
        mov       eax, 1                                        #9.12
..B1.2:                         # Preds ..B1.2 ..B1.1
        xor       edx, edx                                      #9.12
        test      esi, esi                                      #9.12
        cmovg     edx, eax                                      #9.12
        dec       esi                                           #9.12
        imul      edi, edi, 77                                  #8.6
        test      edx, edx                                      #9.12
        jne       ..B1.2        # Prob 95%                      #9.12
        imul      eax, edi, 77                                  #11.15
        ret                                                     #11.15

The function has doubled in size to 10 instructions, and (worse yet!) the critical loop has more than doubled to 7 instructions with a long critical dependency chain involving a cmov and other weird stuff.

The same is true if you use the unlikely hint and also across all icc versions (13, 14, 17) that godbolt supports. So the code generation is strictly worse, regardless of the hint, and regardless of the actual runtime behavior.

Neither gcc nor clang suffer any degradation when hints are used.

What's up with that?

At least in the first and subsequent examples I tried.

解决方案

To me it seems an ICC bug. This code (available on godbolt)

int c;

do
{
    a *= 77;
    c = b--;
}
while (likely(c > 0));

that simply use an auxiliary local var c, produces an output without the edx = !!(esi > 0) pattern

foo(int, int):
  ..B1.2:
    mov       eax, esi
    dec       esi
    imul      edi, edi, 77
    test      eax, eax
    jg        ..B1.2

still not optimal (it could do without eax), though.

I don't know if the official ICC policy about __builtin_expect is full support or just compatibility support.

This question seems better suited for the Official ICC forum.
I've tried posting this topic there but I'm not sure I've made a good job (I've been spoiled by SO).
If they answer me I'll update this answer.

EDIT
I've got and an answer at the Intel Forum, they recorded this issue in their tracking system.
As today, it seems a bug.

这篇关于为什么icc无法以合理的方式处理编译时分支提示?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..