问题描述
我的C ++代码使用SSE,现在我想改进它以支持AVX可用时。所以我检测AVX可用时,并调用使用AVX命令的函数。我使用Win7 SP1 + VS2010 SP1和一个带AVX的CPU。
My C++ code uses SSE and now I want to improve it to support AVX when it is available. So I detect when AVX is available and call a function that uses AVX commands. I use Win7 SP1 + VS2010 SP1 and a CPU with AVX.
要使用AVX,必须包括:
To use AVX, it is necessary to include this:
#include "immintrin.h"
那么你可以使用 _mm256_mul_ps
, _mm256_add_ps
等内在函数AVX函数。
问题是, VS2010生成的代码工作速度很慢,并显示警告:
and then you can use intrinsics AVX functions like _mm256_mul_ps
, _mm256_add_ps
etc.The problem is that by default, VS2010 produces code that works very slowly and shows the warning:
看起来VS2010实际上不使用AVX指令,而是模拟它们。我添加了 / arch:AVX
到编译器选项,并得到好的结果。但是这个选项告诉编译器在可能的地方使用AVX命令。所以我的代码可能会崩溃,不支持AVX的CPU!
It seems VS2010 actually does not use AVX instructions, but instead, emulates them. I added /arch:AVX
to the compiler options and got good results. But this option tells the compiler to use AVX commands everywhere when possible. So my code may crash on CPU that does not support AVX!
所以问题是如何使VS2010编译器产生AVX代码,但是只有当我直接指定AVX内在函数。对于SSE它的工作原理,我只是使用SSE内在函数,并生成SSE代码,没有任何编译器选项,如 / arch:SSE
。但是对于AVX,由于某种原因它不工作。
So the question is how to make VS2010 compiler to produce AVX code but only when I specify AVX intrinsics directly. For SSE it works, I just use SSE intrinsics functions and it produce SSE code without any compiler options like /arch:SSE
. But for AVX it does not work for some reason.
推荐答案
你看到的行为是昂贵的状态转换。
The behavior that you are seeing is the result of expensive state-switching.
请参阅Agner Fog手册第102页:
See page 102 of Agner Fog's manual:
每次您不当地切换
当编译时不带 / arch: AVX
,VS2010将生成SSE指令,但仍然会使用AVX,无论你有AVX内在。因此,您将获得具有SSE和AVX指令的代码 - 这将具有这些状态切换惩罚。 (VS2010知道这一点,所以它会发出你看到的警告。)
When you compile without /arch:AVX
, VS2010 will generate SSE instructions, but will still use AVX wherever you have AVX intrinsics. Therefore, you'll get code that has both SSE and AVX instructions - which will have those state-switching penalties. (VS2010 knows this, so it emits that warning you're seeing.)
因此,您应该使用所有SSE或所有AVX。指定 / arch:AVX
会告诉编译器使用所有AVX。
Therefore, you should use either all SSE, or all AVX. Specifying /arch:AVX
tells the compiler to use all AVX.
多个代码路径:一个用于SSE,一个用于AVX。
为此,我建议您将SSE和AVX代码分成两个不同的编译单元。 (一个用 / arch:AVX
编译,没有)然后将它们链接在一起并根据运行的硬件选择一个调度程序。
It sounds like you're trying to make multiple code paths: one for SSE, and one for AVX.For this, I suggest you separate your SSE and AVX code into two different compilation units. (one compiled with /arch:AVX
and one without) Then link them together and make a dispatcher to choose based on the what hardware it's running on.
如果您需要 混合使用SSE和AVX,请务必使用 _mm256_zeroupper()
或 _mm256_zeroall()
,以避免状态切换惩罚。
If you need to mix SSE and AVX, be sure to use _mm256_zeroupper()
or _mm256_zeroall()
appropriately to avoid the state-switching penalties.
这篇关于使用AVX CPU指令:无“/ arch:AVX”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!