问题描述
我想有效地执行以下任务:
I'm trying to efficiently execute the following task:
INPUT VALUE: 01101011
MASK: 00110010
MASK RESULT: --10--1-
AGGREGATED: 00000101
我希望这举例说明清楚我想要实现的。什么是在非天真的方式做到这一点的最好方法是什么?
I hope this examples explains clearly what I'm trying to achieve. What's the best way to do this in a non-naive way?
推荐答案
此操作称为或只是 COM preSS
,这是可怕的适度无需硬件来实现支持。从黑客的喜悦7-4的COM preSS,或广义提取物来实现这个功能的非天真code是
This operation is called compress_right
or just compress
, and it is moderately terrible to implement without hardware support. The non-naive code from Hacker's Delight "7–4 Compress, or Generalized Extract" to implement this function is
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel suffix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
BMI2(在Haswell的版本之后)将有指令 PEXT
进行此操作。
如果掩码是一个常数(或不是一个常数,而是多次重复使用),一个比较明显的优化是pre-计算5个值是 MV
需要在循环过程中。 MV
的计算不依赖于 X
,这样就可以independantly计算,这样的(相同的算法上面真的)
If the mask is a constant (or not a constant but reused multiple times), a relatively obvious optimization is pre-calculating the 5 values that mv
takes during the loop. The calculation of mv
does not depend on x
, so that can be calculated independantly, like this (the same algorithm as above really)
mk = ~m << 1;
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1);
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m;
mask[i] = mv;
m = m ^ mv | (mv >> (1 << i));
mk = mk & ~mp;
}
看起来依然复杂,但这里的一切是一个常数,因此它可以是pre计算的(如果编译器不能做到这一点,那么的您的就可以了,简单地运行它,然后粘贴结果到code)。在code的实部,实际上已经在运行时运行code是这样的:
Still looks complicated, but everything here is a constant, so it can be pre-computed (if the compiler can't do it, then you can, simply by running it and then pasting the result into the code). The "real part" of the code, the code that actually has to run at runtime is this:
x = x & m;
t = x & mask[0];
x = x ^ t | (t >> 1);
t = x & mask[1];
x = x ^ t | (t >> 2);
t = x & mask[2];
x = x ^ t | (t >> 4);
t = x & mask[3];
x = x ^ t | (t >> 8);
t = x & mask[4];
x = x ^ t | (t >> 16);
(这也是黑客的喜悦,格式稍有不同)
(this is also in Hacker's Delight, formatted a little differently)
许多情况下可以更简单又例如:
Many cases can be simpler again, for example:
- 如果
M = 0
,结果是0
。 - 如果
M = -1
,结果是X
。 - 如果
M = 1
,结果是X'放大器; 1
。 - 如果
M =((1 <,结果是
(X&GT;&GT; k)及米
。 - 如果
M = 0x80000000的
,结果是 X&GT;&GT; 31 。 - 如果
M
是2的其他权力,结果是(X&GT;&GT; numberOfTrailingZeros(M))及1
- 如果
M
是交替的,完美unshuffle算法可以使用。 - 如果
M
由少数团体,感动位组算法可以用来(即面膜组,它转移到地方(第一班,面具秒),或全部转移组在一起,但更复杂的方法存在),这可能是在实践中最重要的案例。 - ...
- if
m = 0
, the result is0
. - if
m = -1
, the result isx
. - if
m = 1
, the result isx & 1
. - if
m = ((1 << n) - 1) << k
, the result is(x >> k) & m
. - if
m = 0x80000000
, the result isx >> 31
. - if
m
is an other power of two, the result is(x >> numberOfTrailingZeros(m)) & 1
- if
m
is alternating, the "perfect unshuffle algorithm" can be used. - if
m
consists of a few "groups", the "bit group moving" algorithm can be used (ie mask a group, shift it into place (or shift first, mask second), OR all shifted groups together, though more sophisticated approaches exist), this is probably the most important case in practice. - ...
例如,从你的问题掩码将属于中的位组移动的情况下,用code是这样的:
For example, the mask from your question would fall in the "bit group moving" case, with code like this:
return ((x >> 1) & 1) | ((x >> 3) & 6);
这篇关于面具和聚合位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!