问题描述
使用 SSE4.1 ptest
除了可以测试单个注册全为零吗?
What can you do with SSE4.1 ptest
other than testing if a single register is all-zero?
您可以结合使用SF和CF来测试有关两个未知输入寄存器的任何有用信息吗?
Can you use a combination of SF and CF to test anything useful about two unknown input registers?
PTEST有什么用处?您会认为检查打包比较的结果(例如PCMPEQD或CMPPS)会很好,但是至少在Intel CPU上,
What is PTEST good for? You'd think it would be good for checking the result of a packed-compare (like PCMPEQD or CMPPS), but at least on Intel CPUs, it costs more uops to compare-and-branch using PTEST + JCC than with PMOVMSK(B/PS/PD) + macro-fused CMP+JCC.
推荐答案
否,除非我缺少聪明的东西,否则带有两个未知寄存器的ptest
通常对于检查有关两者的某些属性没有用. (除了明显的东西,您已经想要按位与,例如两个位图之间的交集.)
No, unless I'm missing something clever, ptest
with two unknown registers is generally not useful for checking some property about both of them. (Other than obvious stuff you'd already want a bitwise-AND for, like intersection between two bitmaps).
要测试两个寄存器是否均为全零,或者将它们放在一起,或将其相对于自身进行测试.
To test two registers for both being all-zero, OR them together and PTEST that against itself.
ptest xmm0, xmm1
产生两个结果:
- ZF =
xmm0 & xmm1
全零吗? - CF =
(~xmm0) & xmm1
全为零吗?
- ZF = is
xmm0 & xmm1
all-zero? - CF = is
(~xmm0) & xmm1
all-zero?
如果第二个向量均为全零,则标志完全不依赖于第一个向量中的位.
将全为零"检查视为AND和ANDNOT结果的NOT(bitwise horizontal-OR())
可能很有用.但是可能不是,因为我的大脑很难思考这太多的步骤.垂直与然后水平与的顺序确实可以使我们更容易理解,为什么PTEST不能像整数TEST指令那样告诉您太多关于两个未知寄存器的组合.
It may be useful to think of the "is-all-zero" checks as a NOT(bitwise horizontal-OR())
of the AND and ANDNOT results. But probably not, because that's too many steps for my brain to think through easily. That sequence of vertical-AND and then horizontal-OR does maybe make it easier to understand why PTEST doesn't tell you much about a combination of two unknown registers, just like the integer TEST instruction.
这是2位ptest a,mask
的真值表.希望这有助于考虑零和1与128b输入的混合.
Here's a truth table for a 2-bit ptest a,mask
. Hopefully this helps in thinking about mixes of zeros and ones with 128b inputs.
请注意CF(a,mask) == ZF(~a,mask)
.
a mask ZF CF
00 00 1 1
01 00 1 1
10 00 1 1
11 00 1 1
00 01 1 0
01 01 0 1
10 01 1 0
11 01 0 1
00 10 1 0
01 10 1 0
10 10 0 1
11 10 0 1
00 11 1 0
01 11 0 0
10 11 0 0
11 11 0 1
英特尔的内在函数指南为此列出了2种有趣的内在函数.请注意args的命名:a
和mask
是一个线索,它们可以告诉您有关由已知AND掩码选择的a
部分的信息.
Intel's intrinsics guide lists 2 interesting intrinsics for it. Note the naming of the args: a
and mask
are a clue that they tell you about the parts of a
selected by a known AND-mask.
-
_mm_test_mix_ones_zeros (__m128i a, __m128i mask)
:返回(ZF == 0 && CF == 0)
-
_mm_test_all_zeros (__m128i a, __m128i mask)
:返回ZF
_mm_test_mix_ones_zeros (__m128i a, __m128i mask)
: returns(ZF == 0 && CF == 0)
_mm_test_all_zeros (__m128i a, __m128i mask)
: returnsZF
还有更简单的版本:
-
int _mm_testc_si128 (__m128i a, __m128i b)
:返回CF
-
int _mm_testnzc_si128 (__m128i a, __m128i b)
:返回(ZF == 0 && CF == 0)
-
int _mm_testz_si128 (__m128i a, __m128i b)
:返回ZF
int _mm_testc_si128 (__m128i a, __m128i b)
: returnsCF
int _mm_testnzc_si128 (__m128i a, __m128i b)
: returns(ZF == 0 && CF == 0)
int _mm_testz_si128 (__m128i a, __m128i b)
: returnsZF
这些内在函数有AVX2 __m256i
版本,但本指南仅列出__m128i
操作数的all_zeros和mix_ones_zeros备用名称版本.
There are AVX2 __m256i
versions of those intrinsics, but the guide only lists the all_zeros and mix_ones_zeros alternate-name versions for __m128i
operands.
如果要从C或C ++测试其他条件,则应使用具有相同操作数的testc
和testz
,并希望编译器意识到它只需要执行一次PTEST,甚至希望使用一个JCC,SETCC或CMOVCC来实现您的逻辑. (我建议至少检查您最关心的编译器的asm.)
If you want to test some other condition from C or C++, you should use testc
and testz
with the same operands, and hope that your compiler realizes that it only needs to do one PTEST, and hopefully even use a single JCC, SETCC, or CMOVCC to implement your logic. (I'd recommend checking the asm, at least for the compiler you care about most.)
请注意,_mm_testz_si128(v, set1(0xff))
始终与_mm_testz_si128(v,v)
相同,因为这就是AND的工作方式.但这对于CF结果而言并非如此.
Note that _mm_testz_si128(v, set1(0xff))
is always the same as _mm_testz_si128(v,v)
, because that's how AND works. But that's not true for the CF result.
您可以使用
You can check for a vector being all-ones using
bool is_all_ones = _mm_testc_si128(v, _mm_set1_epi8(0xff));
这可能比PCMPEQB更快,但代码大小更小,而PCMPEQB则针对全为1的矢量,然后是通常的movemask + cmp.这并不能避免需要向量常量.
This is probably no faster, but smaller code-size, than a PCMPEQB against a vector of all-ones, then the usual movemask + cmp. It doesn't avoid the need for a vector constant.
PTEST确实具有即使不使用AVX也不会破坏任何一个输入操作数的优点.
PTEST does have the advantage that it doesn't destroy either input operand, even without AVX.
这篇关于PTEST可以用来测试两个寄存器是否均为零或其他情况吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!