GCC：两个类似的环之间的差异矢量

本文介绍了GCC：两个类似的环之间的差异矢量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在与编译的gcc -O3 ，为什么下面的循环不会矢量化（自动）：

 的#define SIZE（65536）诠释一个[SIZE]，B [SIZE]，C [SIZE]INT富（）{
  INT I，J;  对于（i = 0; I＆LT;大小;我++）{
    为（J = I; J＆LT;大小; J ++）{
      一个由[i] = B [Ⅰ]≥ C [J]？ B〔I]：C [J]。
    }
  }
  返回[0];
}

在下面的人做？

 的#define SIZE（65536）诠释一个[SIZE]，B [SIZE]，C [SIZE]INT foov（）{
  INT I，J;  对于（i = 0; I＆LT;大小;我++）{
    为（J = I; J＆LT;大小; J ++）{
      一个由[i] + = B [Ⅰ]≥ C [J]？ B〔I]：C [J]。
    }
  }
  返回[0];
}

唯一的区别是在内部循环的前pression的结果是否的分配到[I]中，或添加到由[i]

有关参考 -ftree-矢量化-详细= 6 给出了第一个（非量化）循环以下输出。

  v.c：8：注意：不是矢量：内环计数不不变。
v.c：9：注意：访问未知对齐：C
v.c：9：注：访问使用对齐强行剥落。
v.c：9：注意：并非矢量：不支持实时语句：D.2700_5 = C [j_20]v.c：5：注意：在功能上向量化0环。

和为向量化循环相同的输出是：

  v.c：8：注意：不是矢量：内环计数不不变。
v.c：9：注意：访问未知对齐：C
v.c：9：注：访问使用对齐强行剥落。
v.c：9：注意：vect_model_load_cost：对齐。
v.c：9：注意：vect_model_load_cost：inside_cost = 1，outside_cost = 0。
v.c：9：注意：vect_model_simple_cost：inside_cost = 1，outside_cost = 1。
v.c：9：注意：vect_model_reduction_cost：inside_cost = 1，outside_cost = 6。
v.c：9：注意：成本模式：设定为VF / 2的序幕剥离iters。
v.c：9：注意：成本模式：设定为VF / 2，因为脱皮对准不明尾声剥离iters。
v.c：9：注意：成本模型分析：
  矢量循环成本内：3
  循环成本以外的载体：27
  标量迭代成本：3
  标外费用：7
  开场白迭代：2
  结语迭代：2
  对于盈利计算的最小iters：8v.c：9：注意：盈利能力阈值= 7v.c：9：注意：盈利门槛是7循环迭代。
v.c：9：注意：LOOP量化。
v.c：5：注意：在功能上向量化1环。

解决方案

在第一种情况下：在code覆盖相同的内存位置 A [I] 在每个迭代。这在本质上sequentializes环路作为循环迭代不是独立的。结果，
（实际上，只有最后一次迭代实际需要，因此整个内循环可以被取出。）
在第二种情况下：GCC识别环路减少操作 - 它具有其特殊的办案向量化
编译器矢量经常被实现为某种模式匹配的。这意味着编译器分析code，看它是否符合某个特定的模式，它是能够量化。如果是的话，它就会被量化。如果没有，则它不
这似乎是一个极端例子，其中第一循环不适合任何pre-codeD模式，海湾合作委员会可以处理的。但第二种情况适合向量化减排的格局。
下面是吐出了不是矢量：不支持实时语句：GCC的源$ C $ C的相关部分消息：
<一个href=\"http://svn.open64.net/svnroot/open64/trunk/os$p$py-gcc-4.2.0/gcc/tree-vect-analyze.c\">http://svn.open64.net/svnroot/open64/trunk/os$p$py-gcc-4.2.0/gcc/tree-vect-analyze.c
如果（STMT_VINFO_LIVE_P（stmt_info）） { OK = vectorizable_reduction（语句，NULL，NULL）; 如果（OK） need_to_vectorize = TRUE; 其他 OK = vectorizable_live_operation（语句，NULL，NULL）; 如果（OK！） { 如果（vect_print_dump_info（REPORT_UNVECTORIZED_LOOPS）） { fprintf中（vect_dump，不是矢量：不支持实时语句：）; print_generic_expr（vect_dump，语句，TDF_SLIM）; } 返回false; } }
从刚才的行：
vectorizable_reduction（语句，NULL，NULL）;
很显然，海合会正在检查，看它是否符合一个量化的减排的格局。
When compiling with gcc -O3, why does the following loop not vectorize (automatically):
#define SIZE (65536) int a[SIZE], b[SIZE], c[SIZE]; int foo () { int i, j; for (i=0; i<SIZE; i++){ for (j=i; j<SIZE; j++) { a[i] = b[i] > c[j] ? b[i] : c[j]; } } return a[0]; }
when the following one does?
#define SIZE (65536) int a[SIZE], b[SIZE], c[SIZE]; int foov () { int i, j; for (i=0; i<SIZE; i++){ for (j=i; j<SIZE; j++) { a[i] += b[i] > c[j] ? b[i] : c[j]; } } return a[0]; }
The only difference is whether the result of the expression in the inner loop is assigned to a[i], or added to a[i].
For reference -ftree-vectorizer-verbose=6 gives the following output for the first (non-vectorizing) loop.
v.c:8: note: not vectorized: inner-loop count not invariant. v.c:9: note: Unknown alignment for access: c v.c:9: note: Alignment of access forced using peeling. v.c:9: note: not vectorized: live stmt not supported: D.2700_5 = c[j_20]; v.c:5: note: vectorized 0 loops in function.
And the same output for the loop that vectorizes is:
v.c:8: note: not vectorized: inner-loop count not invariant. v.c:9: note: Unknown alignment for access: c v.c:9: note: Alignment of access forced using peeling. v.c:9: note: vect_model_load_cost: aligned. v.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 . v.c:9: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1 . v.c:9: note: vect_model_reduction_cost: inside_cost = 1, outside_cost = 6 . v.c:9: note: cost model: prologue peel iters set to vf/2. v.c:9: note: cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown . v.c:9: note: Cost model analysis: Vector inside of loop cost: 3 Vector outside of loop cost: 27 Scalar iteration cost: 3 Scalar outside cost: 7 prologue iterations: 2 epilogue iterations: 2 Calculated minimum iters for profitability: 8 v.c:9: note: Profitability threshold = 7 v.c:9: note: Profitability threshold is 7 loop iterations. v.c:9: note: LOOP VECTORIZED. v.c:5: note: vectorized 1 loops in function.
解决方案
In the first case: the code overwrites the same memory location a[i] in each iteration. This inherently sequentializes the loop as the loop iterations are not independent.
(In reality, only the final iteration is actually needed. So the entire inner loop could be taken out.)
In the second case: GCC recognizes the loop as a reduction operation - for which it has special case handling to vectorize.
Compiler vectorization is often implemented as some sort of "pattern matching". Meaning the compiler analyzes code to see if it fits a certain pattern that it's able to vectorize. If it does, it gets vectorized. If it doesn't, then it doesn't.
This seems to be a corner case where the first loop doesn't fit any of the pre-coded patterns that GCC can handle. But the second case fits the "vectorizable reduction" pattern.
Here's the relevant part of GCC's source code that spits out that "not vectorized: live stmt not supported: " message:
http://svn.open64.net/svnroot/open64/trunk/osprey-gcc-4.2.0/gcc/tree-vect-analyze.c
if (STMT_VINFO_LIVE_P (stmt_info)) { ok = vectorizable_reduction (stmt, NULL, NULL); if (ok) need_to_vectorize = true; else ok = vectorizable_live_operation (stmt, NULL, NULL); if (!ok) { if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) { fprintf (vect_dump, "not vectorized: live stmt not supported: "); print_generic_expr (vect_dump, stmt, TDF_SLIM); } return false; } }
From just the line:
vectorizable_reduction (stmt, NULL, NULL);
It's clear that GCC is checking to see if it matches a "vectorizable reduction" pattern.

这篇关于GCC：两个类似的环之间的差异矢量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..