问题描述
是的,谢谢你的工作。 @PeterCordes。 __ int128
也可以。但是,正如您所说的那样,还使用了C中的 _addcarry_u64
多精度算术的内在函数,并使用了头文件 immintrin.h
我有以下代码
#include< stdio.h>
#include< stdlib.h>
#include< stdint.h>
#include< immintrin.h>
unsigned char _addcarry_u64(unsigned char c_in,uint64_t src1,uint64_t src2,uint64_t * sum);
int main()
{
无符号字符集;
uint64_t sum;
long long int c1 = 0,c2 = 0;
uint64_t a = 0x0234BDFA12CD4379,b = 0xA8DB4567ACE92B38;
进位= _addcarry_u64(0,a,b,& sum);
printf( sum为%lx,进位值为%u n,sum,carry);
返回0;
}
您能指出我的错误吗?我收到对 _addcarry_u64
的未定义引用。一些快速的谷歌无法回答问题,如果要使用任何其他头文件,或者它与gcc不兼容,为什么呢?
最初我有这段代码要添加两个64位数字:
静态__inline int is_digit_lessthan_ct(digit_t x,digit_t y)
{//是x< ; ??
return(int)((x ^((x ^ y)|((x-y)^ y)))>>(RADIX-1));
}
#define ADDC(carryIn,addend1,addend2,结账,sumOut)\
{digit_t tempReg =(addend1)+(int)(carryIn ); \
(sumOut)=(addend2)+ tempReg; \
(carryOut)=(is_digit_lessthan_ct(tempReg,(int)(carryIn))| is_digit_lessthan_ct((sumOut),tempReg)); b
}
现在我知道可以提高此实现的速度使用汇编语言。因此,我正在尝试执行类似的操作,但是我无法访问或退回携带。这是我的代码:
#include< stdio.h>
#include< stdlib.h>
#include< stdint.h>
uint64_t add32(uint64_t a,uint64_t b)
{
uint64_t d = 0,进位= 0;
__asm __( mov%1,%% rax\n\t
adc%2,%% rax\n\t
mov %% rax, %0\n\t
: = r(d)
: r(a), r(b)
:%rax
);
return d;
}
int main()
{
uint64_t a = 0xA234BDFA12CD4379,b = 0xA8DB4567ACE92B38;
printf( Sum =%lx \n,add32(a,b));
返回0;
}
此加法运算的结果应为14B100361BFB66EB1,其中以msb开头的1是随身携带。我想将该进位保存在另一个寄存器中。我尝试了jc,但出现了某些错误。甚至setc都给了我错误,可能是因为我不确定语法。因此,谁能告诉我如何将进位保存在另一个寄存器中或通过修改此代码将其返回?
与往常一样,内联asm并非绝对必要。 。但是目前,编译器对于实际的扩展精度加法来说有点糟,因此您可能需要使用asm。
adc
: _addcarry_u64
。但是,不幸。在64位平台上的GNU C中,您可以只使用 unsigned __int128
。
在使用 carry_out =(x + y)< x
,其中<
是无符号比较。例如:
struct long_carry {unsigned long res;未签名的进位};
struct long_carry add_carryout(unsigned long x,unsigned long y){
unsigned long retval = x + y;
无符号进位=(retval< x);
return(struct long_carry){retval,结帐};
}
(和铛发出类似代码):
mov rax,rdi#因为我们需要在另一个寄存器
中返回值xor edx,edx#为setc设置
添加rax,rsi#生成进位
setc dl #保存进位。
ret#返回rax = sum,edx = carry(SysV ABI结构打包)
使用内联汇编,没有比这更好的方法了。该功能对于现代CPU已经看起来是最佳的。 (好吧,我想如果 mov
不是零延迟,那么先执行 add
会缩短准备就绪的延迟。 。但是在Intel CPU上,最好立即覆盖mov-消除结果,因此最好先进行mov然后再添加。)
Clang甚至会使用 adc
将来自附加项的结转值用作对其他附加项的结转值,但仅用于第一个分支。可能是因为:更新: : carry_out =(x + y)< x
随身携带时不起作用。使用 carry_out =(x + y + c_in)< x
, y + c_in
可以换为零,从而得到(x + 0)< x
(错误),即使有进位。
请注意,clang的 cmp
/ adc reg,0
恰好实现了C的行为,与那里的另一个 adc
并不相同。 / p>
无论如何,在安全的情况下,gcc甚至第一次都不使用 adc
。 (因此,对于不吸引人的代码,请使用 unsigned __int128
,对于更宽的整数,请使用asm。)
//用进位_ = 1且y =〜0U断开
静态
无符号adc_buggy(无符号长* sum,无符号长x,无符号长y,无符号进位) {
* sum = x + y +进位;
无符号进位=(* sum< x);
收益;
}
// * x + = * y
void add256(unsigned long * x,unsigned long * y){
无符号进位;
进位= adc(x,x [0],y [0],0);
进位= adc(x + 1,x [1],y [1],进位);
进位= adc(x + 2,x [2],y [2],进位);
进位= adc(x + 3,x [3],y [3],进位);
}
mov rax,qword ptr [rsi]
添加rax,qword ptr [rdi]
mov qword ptr [rdi],rax
mov rax,qword ptr [rdi + 8]
mov r8,qword ptr [rdi + 16]#吊起
mov rdx,qword ptr [rsi + 8]
adc rdx, rax#好的,没有内存操作数,但仍然adc
mov qword ptr [rdi + 8],rdx
mov rcx,qword ptr [rsi + 16]#r8之前已加载
添加rcx,r8
cmp rdx,rax#手动检查前一个结果是否进位。 / facepalm
adc rcx,0
...
这很糟糕,因此,如果要扩展精度加法,则仍然需要asm。但是,对于将进位带入C变量,您不必这么做。
Yeah thanks that works. @PeterCordes. Also __int128
works. But one more thing as you said using the intrinsics of multiprecision arithmetic that is _addcarry_u64
in C, using the header file immintrin.h
I have the following code
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <immintrin.h>
unsigned char _addcarry_u64(unsigned char c_in, uint64_t src1, uint64_t src2,uint64_t *sum);
int main()
{
unsigned char carry;
uint64_t sum;
long long int c1=0,c2=0;
uint64_t a=0x0234BDFA12CD4379,b=0xA8DB4567ACE92B38;
carry = _addcarry_u64(0,a,b,&sum);
printf("sum is %lx and carry value is %u n",sum,carry);
return 0;
}
Can you please point me out the error? I'm getting undefined reference to _addcarry_u64
. Some quick google doesn't answer the problem , if any other header file to be used or it is not compatible with gcc and why so
Initially I had this code for adding two 64 bit numbers:
static __inline int is_digit_lessthan_ct(digit_t x, digit_t y)
{ // Is x < y?
return ( int)((x ^ ((x ^ y) | ((x - y) ^ y))) >> (RADIX-1));
}
#define ADDC(carryIn, addend1, addend2, carryOut, sumOut) \
{ digit_t tempReg = (addend1) + (int)(carryIn); \
(sumOut) = (addend2) + tempReg; \
(carryOut) = (is_digit_lessthan_ct(tempReg, (int)(carryIn)) | is_digit_lessthan_ct((sumOut), tempReg)); \
}
Now I got to know that the speed of this implementation can be improved using assembly language. So I am trying to do something similar however I cannot access or return the carry. Here is my code :
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
uint64_t add32(uint64_t a,uint64_t b)
{
uint64_t d=0,carry=0;
__asm__("mov %1,%%rax\n\t"
"adc %2,%%rax\n\t"
"mov %%rax,%0\n\t"
:"=r"(d)
:"r"(a),"r"(b)
:"%rax"
);
return d;
}
int main()
{
uint64_t a=0xA234BDFA12CD4379,b=0xA8DB4567ACE92B38;
printf("Sum = %lx \n",add32(a,b));
return 0;
}
The result of this addition should be 14B100361BFB66EB1, where the initial 1 in msb is the carry. I want to save that carry in another register. I tried jc, but I'm getting some or the other error. Even setc gave me error, may be because I'm not sure of the syntax. So can anyone tell me how to save the carry in another register or return it by modifying this code?
As usual, inline asm is not strictly necessary. https://gcc.gnu.org/wiki/DontUseInlineAsm. But currently compilers kinda suck for actual extended-precision addition, so you might want asm for this.
There's an Intel intrinsic for adc
: _addcarry_u64
. But gcc and clang may make slow code., unfortunately. In GNU C on a 64-bit platform, you could just use unsigned __int128
.
Compilers usually manage to make pretty good code when checking for carry-out from addition using the idiom that carry_out = (x+y) < x
, where <
is an unsigned compare. For example:
struct long_carry { unsigned long res; unsigned carry; };
struct long_carry add_carryout(unsigned long x, unsigned long y) {
unsigned long retval = x + y;
unsigned carry = (retval < x);
return (struct long_carry){ retval, carry };
}
gcc7.2 -O3 emits this (and clang emits similar code):
mov rax, rdi # because we need return value in a different register
xor edx, edx # set up for setc
add rax, rsi # generate carry
setc dl # save carry.
ret # return with rax=sum, edx=carry (SysV ABI struct packing)
There's no way you can do better than this with inline asm; this function already looks optimal for modern CPUs. (Well I guess if mov
wasn't zero latency, doing the add
first would shorten the latency to carry being ready. But on Intel CPUs, it's supposed to be better to overwrite mov-elimination results right away, so it's better to mov first and then add.)
Clang will even use adc
to use the carry-out from an add as the carry-in to another add, but only for the first limb. Perhaps because: Update: this function is broken: carry_out = (x+y) < x
doesn't work when there's carry-in. With carry_out = (x+y+c_in) < x
, y+c_in
can wrap to zero and give you (x+0) < x
(false) even though there was carry.
Notice that clang's cmp
/adc reg,0
exactly implements the behaviour of the C, which isn't the same as another adc
there.
Anyway, gcc doesn't even use adc
the first time, when it is safe. (So use unsigned __int128
for code that doesn't suck, and asm for integers even wider than that).
// BROKEN with carry_in=1 and y=~0U
static
unsigned adc_buggy(unsigned long *sum, unsigned long x, unsigned long y, unsigned carry_in) {
*sum = x + y + carry_in;
unsigned carry = (*sum < x);
return carry;
}
// *x += *y
void add256(unsigned long *x, unsigned long *y) {
unsigned carry;
carry = adc(x, x[0], y[0], 0);
carry = adc(x+1, x[1], y[1], carry);
carry = adc(x+2, x[2], y[2], carry);
carry = adc(x+3, x[3], y[3], carry);
}
mov rax, qword ptr [rsi]
add rax, qword ptr [rdi]
mov qword ptr [rdi], rax
mov rax, qword ptr [rdi + 8]
mov r8, qword ptr [rdi + 16] # hoisted
mov rdx, qword ptr [rsi + 8]
adc rdx, rax # ok, no memory operand but still adc
mov qword ptr [rdi + 8], rdx
mov rcx, qword ptr [rsi + 16] # r8 was loaded earlier
add rcx, r8
cmp rdx, rax # manually check the previous result for carry. /facepalm
adc rcx, 0
...
This sucks, so if you want extended-precision addition, you still need asm. But for getting the carry-out into a C variable, you don't.
这篇关于如何在C中使用asm添加两个64位数字时访问进位标志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!