问题描述
许多CPU具有单个汇编操作码,用于返回32位整数乘法的 high 位.通常,将两个32位整数相乘会产生64位结果,但是如果将其存储在32位整数中,则该结果将被截断为低32位.
Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produces a 64 bit result, but this is truncated to the low 32 bits if you store it in a 32 bit integer.
例如,在PowerPC上, mulhw 操作码返回32x32位乘以一个时钟的64位结果的高32位.这正是我在寻找的东西,但更便于携带. NVidia CUDA中有一个类似的操作码umulhi().
For example, on PowerPC, the mulhw opcode returns the high 32 bits of the 64 bit result of a 32x32 bit multiply in one clock. This is exactly what I'm looking for, but more portably. There's a similar opcode, umulhi(), in NVidia CUDA.
在C/C ++中,是否有一种有效的方法来返回32x32乘法的高阶位?目前,我通过将其转换为64位来进行计算,例如:
In C/C++, is there an efficient way to return the high order bits of the 32x32 multiply?Currently I compute it by casting to 64 bits, something like:
unsigned int umulhi32(unsigned int x, unsigned int y)
{
unsigned long long xx=x;
xx*=y;
return (unsigned int)(xx>>32);
}
但这比普通的32乘32乘法慢11倍以上,因为即使在乘法运算中我也使用过大的64位数学运算.
but this is over 11 times slower than a regular 32 by 32 multiply because I'm using overkill 64 bit math even for the multiply.
有没有一种更快的方法来计算高阶位?
Is there a faster way to compute the high order bits?
这显然不是不是最好使用BigInteger库解决的方法(这是过大的做法,并且会产生巨大的开销).
This is clearly not best solved with a BigInteger library (which is overkill and will have huge overhead).
SSE似乎具有 PMULHUW ,它是16x16->顶部16位版本,但不是32x32->顶部32位版本,就像我正在寻找的那样.
SSE seems to have PMULHUW, a 16x16 -> top 16 bit version of this, but not a 32x32 -> top 32 version like I'm looking for.
推荐答案
gcc 4.3.2(具有-O1优化或更高版本),完全按照如下所示将其功能转换为IA32程序集:
gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:
umulhi32:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
mull 8(%ebp)
movl %edx, %eax
popl %ebp
ret
仅执行一个32位mull
,并将结果的高32位(来自%edx
)放入返回值.
Which is just doing a single 32 bit mull
and putting the high 32 bits of the result (from %edx
) into the return value.
这就是您想要的,对吧?听起来您只需要对编译器进行优化即可;)您有可能可以通过消除中间变量来向正确的方向推动编译器:
That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:
unsigned int umulhi32(unsigned int x, unsigned int y)
{
return (unsigned int)(((unsigned long long)x * y)>>32);
}
这篇关于有效计算32位整数乘法的高阶位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!