本文介绍了SIMD/SSE:如何检查所有矢量元素都不为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要检查所有矢量元素是否为非零.到目前为止,我找到了以下解决方案.有一个更好的方法吗?我正在Linux/x86_64上使用gcc 4.8.2,说明最高为SSE4.2.

I need to check that all vector elements are non-zero. So far I found following solution. Is there a better way to do this? I am using gcc 4.8.2 on Linux/x86_64, instructions up to SSE4.2.

typedef char ChrVect __attribute__((vector_size(16), aligned(16)));

inline bool testNonzero(ChrVect vect)
{
    const ChrVect vzero = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
    return (0 == (__int128_t)(vzero == vect));
}

更新:上面的代码被编译为以下汇编代码(当编译为非内联函数时):

Update: code above is compiled to following assembler code (when compiled as non-inline function):

movdqa  %xmm0, -24(%rsp)
pxor    %xmm0, %xmm0
pcmpeqb -24(%rsp), %xmm0
movdqa  %xmm0, -24(%rsp)
movq    -24(%rsp), %rax
orq -16(%rsp), %rax
sete    %al
ret

推荐答案

使用直接的SSE内部函数,您可以这样做:

With straight SSE intrinsics you might do it like this:

inline bool testNonzero(__m128i v)
{
    __m128i vcmp = _mm_cmpeq_epi8(v, _mm_setzero_si128());
#if __SSE4_1__  // for SSE 4.1 and later use PTEST
    return _mm_testz_si128(vcmp, vcmp);
#else           // for older SSE use PMOVMSKB
    uint32_t mask = _mm_movemask_epi8(vcmp);
    return (mask == 0);
#endif
}

我建议查看编译器当前为现有代码生成的内容,然后使用内在函数将其与该版本进行比较,看看是否存在显着差异.

I suggest looking at what your compiler currently generates for your existing code and then compare it with this version using intrinsics and see if there is any significant difference.

使用SSE3(clang -O3 -msse3),可以获得上述功能的以下信息:

With SSE3 (clang -O3 -msse3) I get the following for the above function:

pxor    %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
pmovmskb    %xmm0, %ecx
testl   %ecx, %ecx

SSE4版本(clang -O3 -msse4.1)产生:

The SSE4 version (clang -O3 -msse4.1) produces:

pxor    %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
ptest   %xmm0, %xmm0

请注意,xmm1的调零通常会从包含此功能的任何循环中取消,因此在循环中使用时,上述序列应减少一条指令.

Note that the zeroing of xmm1 will typically be hoisted out of any loop containing this function, so the above sequences should be reduced by one instruction when used inside a loop.

这篇关于SIMD/SSE:如何检查所有矢量元素都不为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 13:17