问题描述
对于我的一个OS X程序,我有几个优化的案例,使用SSE4.1指令。在仅限SSE3的机器上,未优化的分支运行:
For one of my OS X programs, I have a few optimized cases which use SSE4.1 instructions. On SSE3-only machines, the non-optimized branch is ran:
// SupportsSSE4_1 returns true on CPUs that support SSE4.1, false otherwise
if (SupportsSSE4_1()) {
// Code that uses _mm_dp_ps, an SSE4 instruction
...
__m128 hDelta = _mm_sub_ps(here128, right128);
__m128 vDelta = _mm_sub_ps(here128, down128);
hDelta = _mm_sqrt_ss(_mm_dp_ps(hDelta, hDelta, 0x71));
vDelta = _mm_sqrt_ss(_mm_dp_ps(vDelta, vDelta, 0x71));
...
} else {
// Equivalent code that uses SSE3 instructions
...
}
为了得到上面的编译,我必须设置 CLANG_X86_VECTOR_INSTRUCTIONS
到 sse4.1
。
In order to get the above to compile, I had to set CLANG_X86_VECTOR_INSTRUCTIONS
to sse4.1
.
但是,这似乎指示clang可以使用 ROUNDSD
指令在我的程序的任何地方。因此,该程序正在使用 SIGILL:ILL_ILLOPC
的SSE3机器上崩溃。
However, this seems to instruct clang that it's ok to use the ROUNDSD
instruction anywhere in my program. Hence, the program is crashing on SSE3-only machines with SIGILL: ILL_ILLOPC
.
SSE4.1只针对 SupportsSSE4_1()
if块的真分支中的代码行?
What's the best practice for enabling SSE4.1 for just the lines the code inside of true branch of the SupportsSSE4_1()
if block?
推荐答案
目前没有办法以块/函数粒度在clang中定位不同的ISA扩展。您只能以文件粒度(将SSE4.1代码放入单独的文件并指定该文件使用 -msse4.1
) 。如果这是您的一个重要功能,请提交错误报告来请求!
There is currently no way to target different ISA extensions at block / function granularity in clang. You can only do it at file granularity (put your SSE4.1 code into a separate file and specify that file to use -msse4.1
). If this is an important feature for you, please file a bug report to request it!
但是,我应该注意到 DPPS的实际好处
在大多数真实场景中是相当小的(并且使用 DPPS
甚至会减慢一些代码序列!除非这个特定的代码序列是关键的,并且你仔细测量了使用DPPS的效果,它可能不值得为SSE4.1的特殊情况麻烦,即使该编译器功能可用。
However, I should note that the actually benefit of DPPS
is pretty small in most real scenarios (and using DPPS
even slows down some code sequences!). Unless this particular code sequence is critical, and you have carefully measured the effect of using DPPS, it may not be worth the hassle to special case for SSE4.1 even if that compiler feature is available.
这篇关于Clang:在每个功能/每块代码的基础上启用SSE4的正确方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!