问题描述
MOVMSKB
在将字节字段打包为位方面做得非常好.
但是我想反过来.
我有16位的位字段要放入XMM寄存器中.
每位1个字节字段.
最好设置位应设置每个字节字段的MSB(0x80),但我可以忍受设置位,使字节字段的结果为0xFF.
MOVMSKB
does a really nice job of packing byte fields into bits.
However I want to do the reverse.
I have a bit field of 16 bits that I want to put into a XMM register.
1 byte field per bit.
Preferably a set bit should set the MSB (0x80) of each byte field, but I can live with a set bit resulting in a 0xFF result in the byte field.
我在 https上看到了以下选项://software.intel.com/zh-CN/forums/intel-isa-extensions/topic/298374 :
movd mm0, eax
punpcklbw mm0, mm0
pshufw mm0, mm0, 0x00
pand mm0, [mask8040201008040201h]
pcmpeb mm0, [mask8040201008040201h]
但是,此代码仅适用于MMX寄存器,而不能用于XMM regs,因为pshufw不允许这样做.
However this code only works with MMX registers and cannot be made to work with XMM regs because pshufw does not allow that.
我知道我可以使用PSHUFB
,但是那是SSSE3,我想拥有SSE2代码,因为它需要在任何AMD64系统上都能工作.
I know I can use PSHUFB
, however that's SSSE3 and I would like to have SSE2 code because it needs to work on any AMD64 system.
有没有办法做到这一点,那就是纯SSE2代码?
请不要使用内在函数,只需使用普通的Intel x64代码即可.
Is there a way to do this is pure SSE2 code?
no intrinsics please, just plain intel x64 code.
推荐答案
幸运的是,pshufd
是SSE2,您只需要再次打开包装即可.我相信这应该可行:
Luckily pshufd
is SSE2, you just need to unpack it once more. I believe this should work:
movd xmm0, eax
punpcklbw xmm0, xmm0
punpcklbw xmm0, xmm0
pshufd xmm0, xmm0, 0x50
pand xmm0, [mask]
pcmpeqb xmm0, [mask]
约翰说:
movd xmm0, eax
punpcklbw xmm0, xmm0
pshufd xmm0, xmm0, 0x00
pand xmm0, [mask]
pcmpeqb xmm0, [mask]
但是此代码不起作用.示例:假设输入为0x00FF
(字),也就是说我们要设置低8字节.
However this code should not work. Example: Assume input is 0x00FF
(word), that is we want the low 8 bytes set.
punpcklbw xmm0, xmm0 ; 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF
pshufd xmm0, xmm0, 0x00 ; 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF FF
pand xmm0, [mask] ; 00 00 02 01 00 00 02 01 00 00 02 01 00 00 02 01
pcmpeqb xmm0, [mask] ; 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF FF
这是错误的结果,因为我们想要00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF
.当然,它确实为您提供了8个设置字节,而不是与位相对应的8个字节.
This is the wrong result because we wanted 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF
. Sure, it does give you 8 set bytes, just not the 8 which correspond to the bits.
这篇关于解压缩位域(movmskb的逆函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!