本文介绍了如何在汇编中将两个十六进制128位数字相乘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在内存中有两个128位十六进制数字,例如(小尾数):

I have two 128 bit numbers in memory in hexadecimal, for example (little endian):

x:0x12 0x45 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
y:0x36 0xa1 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

我必须在这两个数字之间执行无符号乘法,所以我的新数字将是:

I've to perform the unsigned multiplication between these two numbers so my new number will be:

z:0xcc 0xe3 0x7e 0x2b 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

现在,我知道可以将x和y的一半移到raxrbx寄存器中,例如,执行mul操作,并对另一半执行相同的操作.问题在于这样做会遗留残留物,我也不知道如何避免这种情况.我正要面对这个问题,大约需要4个小时,我能看到的唯一解决方案是二进制转换(and<-> shl,1).

Now, I'm aware that I can move the half x and y number into rax and rbx registers and, for example, do the mul operation, and do the same with the other half. The problem is that by doing so I lose the carry-over and I've no idea how I can avoid that. It's about 4 hours I'm facing this problem and the only solution that can I see is the conversion in binary (and <-> shl,1).

您能给我一些有关此问题的信息吗?
我认为最好的解决方案是花一个字节的时间.

Can you give me some input about this problem?
I think the best solution is to take one byte par time.

推荐答案

像往常一样,询问编译器如何有效地执行操作:64位平台上的GNU C支持__int128_t和.

As usual, ask a compiler how to do something efficiently: GNU C on 64-bit platforms supports __int128_t and __uint128_t.

__uint128_t mul128(__uint128_t a, __uint128_t b) { return a*b; }

编译为( gcc6. 2 -O3 on Godbolt )

compiles to (gcc6.2 -O3 on Godbolt)

    imul    rsi, rdx        # tmp94, b
    mov     rax, rdi  # tmp93, a
    imul    rcx, rdi        # tmp95, a
    mul     rdx       # b
    add     rcx, rsi  # tmp96, tmp94
    add     rdx, rcx  #, tmp96
    ret

由于这是针对x86-64 System V调用约定的,因此a位于RSI:RDI中,而b位于RCX:RDX中. 结果在RDX:RAX中返回.

Since this is targeting the x86-64 System V calling convention, a is in RSI:RDI, while b is in RCX:RDX. The result is returned in RDX:RAX.

很巧的是它只需要一条MOV指令,因为gcc不需要a_upper * b_lower的上半部分结果,反之亦然.由于IMUL仅使用一次,因此可以用更快的2运算符形式的IMUL销毁一半的输入.

Pretty nifty that it only takes one MOV instruction, since gcc doesn't need the high-half result of a_upper * b_lower or vice versa. It can destroy the high halves of the inputs with the faster 2-operand form of IMUL since they're only used once.

使用-march=haswell启用BMI2,gcc使用 MULX 甚至可以避免一个MOV.

With -march=haswell to enable BMI2, gcc uses MULX to avoid even the one MOV.

有时候编译器的输出并不完美,但是通常,一般的策略是手动优化的一个很好的起点.

Sometimes compiler output isn't perfect, but very often the general strategy is a good starting point for optimizing by hand.

当然,如果您最初真正想要的是C语言中的128位乘法,只需使用编译器的内置支持即可.这样一来,优化程序就可以完成工作,通常比在inline-asm中编写几篇文章时提供更好的结果. ( https://gcc.gnu.org/wiki/DontUseInlineAsm ).

Of course, if what you really wanted in the first place was 128-bit multiplies in C, just use the compiler's built-in support for it. That lets the optimizer do its job, often giving better results than if you'd written a couple parts in inline-asm. (https://gcc.gnu.org/wiki/DontUseInlineAsm).

这篇关于如何在汇编中将两个十六进制128位数字相乘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 11:01