测试AVX寄存器是否包含相等的整数

测试AVX寄存器是否包含相等的整数

本文介绍了测试AVX寄存器是否包含相等的整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个包含四个64位整数的256位寄存器.在AVX/AVX2中是否可以有效地测试其中一些整数是否相等?

Consider a 256-bit register containing four 64-bit integers.Is it possible in AVX/AVX2 to test efficiently whether some of these integers are equal?

例如:

a){43, 17, 25, 8}:结果必须为false,因为4个数字中没有2个相等.

a) {43, 17, 25, 8}: the result must be false because no 2 of the 4 numbers are equal.

b){47, 17, 23, 17}:结果必须为'true',因为数字17在AVX向量寄存器中出现了2次.

b) {47, 17, 23, 17}: the result must be 'true' because number 17 occurs 2 times in the AVX vector register.

如果可能的话,我想用C ++做到这一点,但是如果需要的话,我可以进行汇编.

I'd like to do this in C++, if possible, but I can drop down to assembly if necessary.

推荐答案

对于 AVX512 (AVX512VL + AVX512CD),您可以使用 ,就是为此目的而设计的.

With AVX512 (AVX512VL + AVX512CD), you would use VPCONFLICTQ, which is designed for this purpose.

对于 AVX2 :

通过执行较少的冗余比较来省去几个操作:

Shaved off a couple of operations by doing fewer redundant comparisons:

int test1(__m256i x)
{
    __m256i x0 = _mm256_permute4x64_epi64(x, 0x4B);
    // 1 0 2 3
    // 3 2 1 0
    __m256i e0 = _mm256_cmpeq_epi64(x0, x);
    __m256i x1 = _mm256_shuffle_epi32(x, 0x4E);
    // 2 3 0 1
    // 3 2 1 0
    __m256i e1 = _mm256_cmpeq_epi64(x1, x);
    __m256i t = _mm256_or_si256(e0, e1);
    return !_mm256_testz_si256(t, _mm256_set1_epi32(-1));
}


以前:


Previously:

一种简单的比较所有内容"的方法可以与某些混洗一起使用,类似这样(未经测试):

A simple "compare everything with everything" approach can be used with some shuffles, something like this (not tested):

int hasDupe(__m256i x)
{
    __m256i x1 = _mm256_shuffle_epi32(x, 0x4E);
    __m256i x2 = _mm256_permute4x64_epi64(x, 0x4E);
    __m256i x3 = _mm256_shuffle_epi32(x2, 0x4E);
    // 2 3 0 1
    // 3 2 1 0
    __m256i e0 = _mm256_cmpeq_epi64(x1, x);
    // 1 0 3 2
    // 3 2 1 0
    __m256i e1 = _mm256_cmpeq_epi64(x2, x);
    // 0 1 2 3
    // 3 2 1 0
    __m256i e2 = _mm256_cmpeq_epi64(x3, x);
    __m256i t0 = _mm256_or_si256(_mm256_or_si256(e0, e1), e2);
    return !_mm256_testz_si256(t0, _mm256_set1_epi32(-1));
}

GCC 7将其编译为合理的代码,但是Clang确实做了奇怪的事情.似乎认为vpor没有256位版本(完全做到这一点).在这种情况下,将OR更改为加法的操作大致相同(将几个-1加在一起不会为零),并且不会对Clang造成麻烦(也未经测试):

GCC 7 compiles this to reasonable code, but Clang does really strange things. It seems to think that vpor has no 256 bit version (which it totally does). Changing the ORs to additions does roughly the same thing in this case (adding a couple of -1's together will not be zero) and doesn't cause trouble with Clang (also not tested):

int hasDupe(__m256i x)
{
    __m256i x1 = _mm256_shuffle_epi32(x, 0x4E);
    __m256i x2 = _mm256_permute4x64_epi64(x, 0x4E);
    __m256i x3 = _mm256_shuffle_epi32(x2, 0x4E);
    // 2 3 0 1
    // 3 2 1 0
    __m256i e0 = _mm256_cmpeq_epi64(x1, x);
    // 1 0 3 2
    // 3 2 1 0
    __m256i e1 = _mm256_cmpeq_epi64(x2, x);
    // 0 1 2 3
    // 3 2 1 0
    __m256i e2 = _mm256_cmpeq_epi64(x3, x);
    // "OR" results, workaround for Clang being weird
    __m256i t0 = _mm256_add_epi64(_mm256_add_epi64(e0, e1), e2);
    return !_mm256_testz_si256(t0, _mm256_set1_epi32(-1));
}

这篇关于测试AVX寄存器是否包含相等的整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 06:12