将SSE/AVX寄存器左移和右移32位，同时移零

本文介绍了将SSE/AVX寄存器左移和右移32位，同时移零的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在将SSE/AVX寄存器移位为零时向左或向右移位32位的倍数.

I want to shift SSE/AVX registers multiples of 32 bits left or right while shifting in zeros.

让我更精确地了解我感兴趣的转换.对于SSE，我想对四个32位浮点数进行以下转换:

Let me be more precise on the shifts I'm interested in. For SSE I want to do the following shifts of four 32bit floats:

shift1_SSE: [1, 2, 3, 4] -> [0, 1, 2, 3]
shift2_SSE: [1, 2, 3, 4] -> [0, 0, 1, 2]

对于AVX，我想进行以下移位:

For AVX I want to shift do the following shifts:

shift1_AVX: [1, 2, 3, 4, 5, 6, 7, 8] -> [0, 1, 2, 3, 4, 5, 6, 7]
shift2_AVX: [1, 2, 3, 4, 5, 6, 7, 8] -> [0, 0, 1, 2, 3, 4, 5, 6]
shift3_AVX: [1, 2, 3, 4 ,5 ,6, 7, 8] -> [0, 0, 0, 0, 1, 2, 3, 4]

对于SSE，我想出了以下代码

For SSE I have come up with the following code

shift1_SSE = _mm_castsi128_ps(_mm_slli_si128(_mm_castps_si128(x), 4));
shift2_SSE = _mm_shuffle_ps(_mm_setzero_ps(), x, 0x40);
//shift2_SSE = _mm_castsi128_ps(_mm_slli_si128(_mm_castps_si128(x), 8));

是否可以使用SSE更好的方法?

对于AVX，我想出了以下需要AVX2的代码(未经测试).编辑(如Paul R所述，此代码无效).

For AVX I have come up with the following code which needs AVX2 (and it's untested). Edit (as explained by Paul R this code won't work).

shift1_AVX2 =_mm256_castsi256_ps(_mm256_slli_si256(_mm256_castps_si256(x), 4)));
shift2_AVX2 =_mm256_castsi256_ps(_mm256_slli_si256(_mm256_castps_si256(x), 8)));
shift3_AVX2 =_mm256_castsi256_ps(_mm256_slli_si256(_mm256_castps_si256(x), 12)));

如何在AVX而不是AVX2上做到最好(例如，使用_mm256_permute或_mm256_shuffle`)?是否可以使用AVX2更好的方法?

How can I do this best with AVX not AVX2 (for example with _mm256_permute or _mm256_shuffle`)? Is there a better way to do this with AVX2?

Paul R告诉我，我的AVX2代码无法正常工作，而且AVX代码可能不值得.对于AVX2，我应该同时使用_mm256_permutevar8x32_ps和_mm256_and_ps.我没有配备AVX2(Haswell)的系统，因此很难测试.

Paul R has informed me that my AVX2 code won't work and that AVX code is probably not worth it. Instead for AVX2 I should use _mm256_permutevar8x32_ps along with _mm256_and_ps. I don't have a system with AVX2 (Haswell) so this is hard to test.

根据Felix Wyss的回答，我为AVX提出了一些解决方案，对于shift1_AVX和shift2_AVX仅需要3种本征，而对于shift3_AVX只需要一种本征.这是因为_mm256_permutef128Ps具有归零功能.

Based on Felix Wyss's answer I came up with some solutions for AVX which only needs 3 intrisnics for shift1_AVX and shift2_AVX and only one intrinsic for shift3_AVX. This is due to the fact that _mm256_permutef128Ps has a zeroing feature.

shift1_AVX

__m256 t0 = _mm256_permute_ps(x, _MM_SHUFFLE(2, 1, 0, 3));
__m256 t1 = _mm256_permute2f128_ps(t0, t0, 41);
__m256 y = _mm256_blend_ps(t0, t1, 0x11);

shift2_AVX

__m256 t0 = _mm256_permute_ps(x, _MM_SHUFFLE(1, 0, 3, 2));
__m256 t1 = _mm256_permute2f128_ps(t0, t0, 41);
__m256 y = _mm256_blend_ps(t0, t1, 0x33);

shift3_AVX

x = _mm256_permute2f128_ps(x, x, 41);

推荐答案

您的SSE实现很好，但是我建议您对两个转换都使用_mm_slli_si128实现-强制转换使它看起来很复杂，但实际上可以归结为每个班次只有一条指令.

Your SSE implementation is fine but I suggest you use the _mm_slli_si128 implementation for both of the shifts - the casts make it look complicated but it really boils down to just one instruction for each shift.

不幸的是，您的AVX2实现无法正常工作.实际上，几乎所有AVX指令都是在两个相邻的128位通道上并行运行的两条SSE指令.因此，对于您的第一个shift_AVX2示例，您将获得:

Your AVX2 implementation won't work unfortunately. Almost all AVX instructions are effectively just two SSE instructions in parallel operating on two adjacent 128 bit lanes. So for your first shift_AVX2 example you'd get:

0, 0, 1, 2, 0, 4, 5, 6
----------- ----------
 LS lane     MS lane

但是，所有信息并没有丢失:可以在AVX上跨通道工作的少数指令之一是 _ mm256_permutevar8x32_ps .请注意，您需要结合使用_mm256_and_ps将移入的元素清零.还要注意，这是一个AVX2解决方案-AVX本身除了基本的算术/逻辑运算以外，在其他方面都非常受限制，因此，我认为如果没有AVX2，您将很难有效地做到这一点.

All is not lost however: one of the few instructions which does work across lanes on AVX is _mm256_permutevar8x32_ps. Note that you'll need to use an _mm256_and_ps in conjunction with this to zero the shifted in elements. Note also that this is an AVX2 solution — AVX on its own is very limited for anything other than basic arithmetic/logic operations so I think you'll have a hard time doing this efficiently without AVX2.

这篇关于将SSE/AVX寄存器左移和右移32位，同时移零的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

avx2