问题描述
我需要检查所有矢量元素是否为非零.到目前为止,我找到了以下解决方案.有一个更好的方法吗?我正在Linux/x86_64上使用gcc 4.8.2,说明最高为SSE4.2.
I need to check that all vector elements are non-zero. So far I found following solution. Is there a better way to do this? I am using gcc 4.8.2 on Linux/x86_64, instructions up to SSE4.2.
typedef char ChrVect __attribute__((vector_size(16), aligned(16)));
inline bool testNonzero(ChrVect vect)
{
const ChrVect vzero = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
return (0 == (__int128_t)(vzero == vect));
}
更新:上面的代码被编译为以下汇编代码(当编译为非内联函数时):
Update: code above is compiled to following assembler code (when compiled as non-inline function):
movdqa %xmm0, -24(%rsp)
pxor %xmm0, %xmm0
pcmpeqb -24(%rsp), %xmm0
movdqa %xmm0, -24(%rsp)
movq -24(%rsp), %rax
orq -16(%rsp), %rax
sete %al
ret
推荐答案
使用直接的SSE内部函数,您可以这样做:
With straight SSE intrinsics you might do it like this:
inline bool testNonzero(__m128i v)
{
__m128i vcmp = _mm_cmpeq_epi8(v, _mm_setzero_si128());
#if __SSE4_1__ // for SSE 4.1 and later use PTEST
return _mm_testz_si128(vcmp, vcmp);
#else // for older SSE use PMOVMSKB
uint32_t mask = _mm_movemask_epi8(vcmp);
return (mask == 0);
#endif
}
我建议查看编译器当前为现有代码生成的内容,然后使用内在函数将其与该版本进行比较,看看是否存在显着差异.
I suggest looking at what your compiler currently generates for your existing code and then compare it with this version using intrinsics and see if there is any significant difference.
使用SSE3(clang -O3 -msse3
),可以获得上述功能的以下信息:
With SSE3 (clang -O3 -msse3
) I get the following for the above function:
pxor %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
pmovmskb %xmm0, %ecx
testl %ecx, %ecx
SSE4版本(clang -O3 -msse4.1
)产生:
The SSE4 version (clang -O3 -msse4.1
) produces:
pxor %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
ptest %xmm0, %xmm0
请注意,xmm1
的调零通常会从包含此功能的任何循环中取消,因此在循环中使用时,上述序列应减少一条指令.
Note that the zeroing of xmm1
will typically be hoisted out of any loop containing this function, so the above sequences should be reduced by one instruction when used inside a loop.
这篇关于SIMD/SSE:如何检查所有矢量元素都不为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!