本文介绍了正确使用加载/存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何使用加载/存储来对齐 int16_t
字节正确交换?
How to use load/store to do aligned int16_t
byte swapping correctly?
void byte_swapping(uint16_t* dest, const uint16_t* src,
size_t count) {
__m128i _s, _d;
for (uint16_t const * end(dest + count); dest != end; dest += 8, src += 8)
{
_s = _mm_load_si128((__m128i*)src);
_d = _mm_or_si128(_mm_slli_epi16(_s, 8), _mm_srli_epi16(_s, 8));
_mm_store_si128((__m128i*) dest, _d);
}
}
推荐答案
当 count
不是8的倍数,或者 src
或 dest
Your code will fail when count
is not a multiple of 8, or when either src
or dest
is not 16 byte aligned.
以下是您的代码的固定(和测试版本):
Here is a fixed (and tested) version of your code:
void byte_swapping(uint16_t* dest, const uint16_t* src, size_t count)
{
size_t i;
for (i = 0; i + 8 <= count; i += 8)
{
__m128i s = _mm_loadu_si128((__m128i*)&src[i]);
__m128i d = _mm_or_si128(_mm_slli_epi16(s, 8), _mm_srli_epi16(s, 8));
_mm_storeu_si128((__m128i*)&dest[i], d);
}
for ( ; i < count; ++i) // handle residual elements
{
uint16_t w = src[i];
w = (w >> 8) | (w << 8);
dest[i] = w;
}
}
这篇关于正确使用加载/存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!