问题描述
我想使用
-msse2 -ftree-vectorizer-verbose=2.
我有以下简单代码:
int main(){
int a[2048], b[2048], c[2048];
int i;
for (i=0; i<2048; i++){
b[i]=0;
c[i]=0;
}
for (i=0; i<2048; i++){
a[i] = b[i] + c[i];
}
return 0;
}
为什么我会收到
test.cpp:10: note: not vectorized: not enough data-refs in basic block.
$ b
谢谢!
Thanks!
推荐答案
在添加一个 asm volatile(:+ m(a),+ m(b),+ m(c): :memory);
接近 main
结尾处,我的 gcc
这:
For what it's worth, after adding an asm volatile("": "+m"(a), "+m"(b), "+m"(c)::"memory");
near the end of main
, my copy of gcc
emits this:
400610: 48 81 ec 08 60 00 00 sub $0x6008,%rsp
400617: ba 00 20 00 00 mov $0x2000,%edx
40061c: 31 f6 xor %esi,%esi
40061e: 48 8d bc 24 00 20 00 lea 0x2000(%rsp),%rdi
400625: 00
400626: e8 b5 ff ff ff callq 4005e0 <memset@plt>
40062b: ba 00 20 00 00 mov $0x2000,%edx
400630: 31 f6 xor %esi,%esi
400632: 48 8d bc 24 00 40 00 lea 0x4000(%rsp),%rdi
400639: 00
40063a: e8 a1 ff ff ff callq 4005e0 <memset@plt>
40063f: 31 c0 xor %eax,%eax
400641: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400648: c5 f9 6f 84 04 00 20 vmovdqa 0x2000(%rsp,%rax,1),%xmm0
40064f: 00 00
400651: c5 f9 fe 84 04 00 40 vpaddd 0x4000(%rsp,%rax,1),%xmm0,%xmm0
400658: 00 00
40065a: c5 f8 29 04 04 vmovaps %xmm0,(%rsp,%rax,1)
40065f: 48 83 c0 10 add $0x10,%rax
400663: 48 3d 00 20 00 00 cmp $0x2000,%rax
400669: 75 dd jne 400648 <main+0x38>
所以它认为第一个循环只是做 memset
到几个数组,第二个循环正在做一个向量加法,它适当地矢量化。
So it recognised that the first loop was just doing memset
to a couple arrays and the second loop was doing a vector addition, which it appropriately vectorised.
我使用 gcc version 4.9 .0 20140521(prerelease)(GCC)
。
使用 gcc 4.7.2版(Debian 4.7 .2-5)
也用矢量化的循环,但以不同的方式。您的 -ftree-vectorizer-verbose = 2
设置使其产生以下输出:
An older machine with gcc version 4.7.2 (Debian 4.7.2-5)
also vectorises the loop, but in a different way. Your -ftree-vectorizer-verbose=2
setting makes it emit the following output:
Analyzing loop at foo155.cc:10
Vectorizing loop at foo155.cc:10
10: LOOP VECTORIZED.
foo155.cc:1: note: vectorized 1 loops in function.
你可能会忘记你的编译器标志(我使用 g ++ -O3 -ftree- vectorize -ftree-vectorizer-verbose = 2 -march = native foo155.cc -o foo155
以构建)或有一个真正的旧编译器。
You probably goofed your compiler flags (I used g++ -O3 -ftree-vectorize -ftree-vectorizer-verbose=2 -march=native foo155.cc -o foo155
to build) or have a really old compiler.
这篇关于为什么矢量化失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!