本文介绍了添加两个浮点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我想计算两个IEEE 754二进制64数字的总和,四舍五入。为此,我写了下面的C99程序: #include< stdio.h> #include< fenv.h> #pragma STDC FENV_ACCESS ON int main(int c,char * v []){ fesetround(FE_UPWARD); printf(%a \\\,0x1.0p0 + 0x1.0p-80);但是,如果我使用各种编译器编译和运行我的程序:) $ gcc -v ... gcc 4.2.1(Apple Inc. build 5664) $ gcc -Wall -std = c99 add.c && ./a.out add.c:3:warning:忽略#pragma STDC FENV_ACCESS 0x1p + 0 $ clang -v Apple clang version 1.5(tags / Apple / clang-60)目标:x86_64-apple-darwin10 线程模型:posix $ clang -Wall -std = c99 add.c &&。 /a.out add.c:3:14:warning:pragma STDC FENV_ACCESS ON不支持,忽略 pragma [-Wunknown-pragmas] #pragma STDC FENV_ACCESS ON ^ 产生1个警告。 0x1p + 0 它不工作! (我期望结果 0x1.0000000000001p0 )。 实际上,计算是在编译时在默认round-to-nearest模式: $ clang -Wall -std = c99 -S add.c && cat add.s add.c:3:14:warning:pragma STDC FENV_ACCESS ON不支持,忽略 pragma [-Wunknown-pragmas] #pragma STDC FENV_ACCESS ON ^ 1警告。 ... LCPI1_0: .quad 4607182418800017408 ... callq _fesetround movb $ 1,%cl movsd LCPI1_0(%rip) xmm0 leaq L_.str(%rip),%rdx movq%rdx,%rdi movb%cl,%al callq _printf ... $ b b L_.str: .asciz%a \\\ 是的,我看到了每个编译器。我理解,在线的规模打开或关闭适用的优化可能是棘手的。我仍然希望,如果这是可能的,将它们关闭在文件的规模,这将足以解决我的问题。 我的问题是:我应该使用GCC或Clang使用哪些命令行选项,以便编译一个C99编译单元,该编译单元包含要使用默认以外的FPU舍入模式执行的代码? 小说 在研究这个问题时,我发现这个 GCC C99合规性页面,其中包含以下条目,如果有人发现有趣,我将离开这里。 Grrrr。 浮点| | 环境访问| N / A |库功能,无需编译器支持。 在< fenv.h> |中| 解决方案我找不到任何你想要的命令行选项。但是,我确实找到了一种方法来重写代码,以便即使有最大的优化(甚至架构优化),GCC和Clang都不会在编译时计算该值。相反,这迫使他们输出将在运行时计算值的代码。 C: #include< fenv.h> #include< stdio.h> #pragma STDC FENV_ACCESS ON //加上四舍五入 double __attribute__((noinline))addrup(double x,double y){ int round = fegetround(); fesetround(FE_UPWARD); double r = x + y; fesetround(round); //恢复旧的舍入模式 return r; } int main(int c,char * v []){ printf(%a \\\,addrup(0x1.0p0,0x1.0p- 80)); } 这会导致GCC和Clang的这些输出,即使使用最大和架构优化: gcc -S -xc -march = corei7 -O3 ( Godbolt GCC ): addrup: push rbx sub rsp,16 movsd QWORD PTR [rsp + 8],xmm0 movsd QWORD PTR [rsp],xmm1 call fegetround mov edi,2048 mov ebx,eax call fesetround movsd xmm1,QWORD PTR [rsp] mov edi,ebx movsd xmm0,QWORD PTR [rsp + 8 ] addsd xmm0,xmm1 movsd QWORD PTR [rsp],xmm0 call fesetround movsd xmm0,QWORD PTR [rsp] add rsp,16 pop rbx ret .LC2: .string%a\\\ main: sub rsp,8 movsd xmm1,QWORD PTR .LC0 [rip] movsd xmm0,QWORD PTR .LC1 [rip] call addrup mov edi,OFFSET FLAT:.LC2 mov eax,1 调用printf xor eax,eax add rsp,8 ret .LC0: .long 0 .long 988807168 .LC1 : .long 0 .long 1072693248 clang -S -xc -march = corei7 -O3 ( Godbolt GCC ): addrup:#@addrup push rbx sub rsp,16 movsd qword ptr [rsp],xmm1#8-byte Spill movsd qword ptr [rsp + 8],xmm0#8-byte Spill call fegetround mov ebx,eax mov edi, 2048 call fesetround movsd xmm0,qword ptr [rsp + 8]#8-byte Reload addsd xmm0,qword ptr [rsp]#8-byte Folded Reload movsd qword ptr [rsp + 8],xmm0#8-byte Spill mov edi,ebx call fesetround movsd xmm0,qword ptr [rsp + 8]#8-byte Reload add rsp,16 pop rbx ret .LCPI1_0: .quad 4607182418800017408#double 1 .LCPI1_1: .quad 4246894448610377728 #double 8.2718061255302767E-25 main:#@main push rax movsd xmm0,qword ptr [rip + .LCPI1_0]#xmm0 = mem [0],zero movsd xmm1,qword ptr [rip + .LCPI1_1]#xmm1 = mem [0],zero call addrup mov edi,.L.str mov al,1 call printf xor eax,eax pop rcx ret .L.str: .asciz%a \\\ 现在更有趣的部分: > 好吧,当他们(GCC和/或Clang)编译代码时,他们尝试查找和替换可以在运行时计算的值。这称为常量传播。如果你只是写了另一个函数,常数传播将停止发生,因为它不应该交叉函数。 然而,如果他们看到一个函数,他们可以,理论上,用代替代替函数调用的代码,他们可以这样做。这称为函数内联。如果函数内联将用于一个函数,则我们假定该函数是(惊喜) inlinable 。 如果函数总是返回相同的结果对于给定的一组输入,则将其视为纯。我们还说,它没有副作用(意味着它不会改变环境)。 现在,如果一个函数是完全inlinable (意味着它不会调用外部库,不包括GCC和Clang中包含的一些默认值) libc , libm 等),并且是纯的,那么它们将对该函数应用常量传播。 换句话说,如果我们不希望它们传播常量一个函数调用,我们可以做两件事之一: 使函数显示不纯: 使用文件系统 从某处随机输入一些bull子魔法 使用网络 使用某种系统调用 从外部库调用GCC和/或Clang未知的内容 / li> 使函数不完全inlinable 从外部库调用GCC和/或Clang未知的内容 使用 __ attribute__((noinline)) 现在,最后一个是最简单的。正如你可能已经推测的, __ attribute__((noinline))将函数标记为非内联。因为我们可以利用这个,所以我们要做的是做另一个函数,做任何我们想要的计算,标记它 __ attribute__((noinline)),然后调用 编译时,它们不会违反内联和扩展常量传播规则,因此,该值将在运行时使用适当的舍入模式集。 I would like to compute the sum, rounded up, of two IEEE 754 binary64 numbers. To that end I wrote the C99 program below:#include <stdio.h>#include <fenv.h>#pragma STDC FENV_ACCESS ONint main(int c, char *v[]){ fesetround(FE_UPWARD); printf("%a\n", 0x1.0p0 + 0x1.0p-80);}However, if I compile and run my program with various compilers:$ gcc -v…gcc version 4.2.1 (Apple Inc. build 5664)$ gcc -Wall -std=c99 add.c && ./a.outadd.c:3: warning: ignoring #pragma STDC FENV_ACCESS0x1p+0$ clang -vApple clang version 1.5 (tags/Apple/clang-60)Target: x86_64-apple-darwin10Thread model: posix$ clang -Wall -std=c99 add.c && ./a.outadd.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas]#pragma STDC FENV_ACCESS ON ^1 warning generated.0x1p+0It doesn't work! (I expected the result 0x1.0000000000001p0).Indeed, the computation was done at compile-time in the default round-to-nearest mode:$ clang -Wall -std=c99 -S add.c && cat add.sadd.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas]#pragma STDC FENV_ACCESS ON ^1 warning generated.…LCPI1_0: .quad 4607182418800017408… callq _fesetround movb $1, %cl movsd LCPI1_0(%rip), %xmm0 leaq L_.str(%rip), %rdx movq %rdx, %rdi movb %cl, %al callq _printf…L_.str: .asciz "%a\n"Yes, I did see the warning emitted by each compiler. I understand that turning the applicable optimizations on or off at the scale of the line may be tricky. I would still like, if that was at all possible, to turn them off at the scale of the file, which would be enough to resolve my question.My question is: what command-line option(s) should I use with GCC or Clang so as to compile a C99 compilation unit that contains code intended to be executed with an FPU rounding mode other than the default?DigressionWhile researching this question, I found this GCC C99 compliance page, containing the entry below, that I will just leave here in case someone else finds it funny. Grrrr.floating-point | |environment access | N/A | Library feature, no compiler support required.in <fenv.h> | | 解决方案 I couldn't find any command line options that would do what you wanted. However, I did find a way to rewrite your code so that even with maximum optimizations (even architectural optimizations), neither GCC nor Clang compute the value at compile time. Instead, this forces them to output code that will compute the value at runtime.C:#include <fenv.h>#include <stdio.h>#pragma STDC FENV_ACCESS ON// add with rounding updouble __attribute__ ((noinline)) addrup (double x, double y) { int round = fegetround (); fesetround (FE_UPWARD); double r = x + y; fesetround (round); // restore old rounding mode return r;}int main(int c, char *v[]){ printf("%a\n", addrup (0x1.0p0, 0x1.0p-80));}This results in these outputs from GCC and Clang, even when using maximum and architectural optimizations:gcc -S -x c -march=corei7 -O3 (Godbolt GCC):addrup: push rbx sub rsp, 16 movsd QWORD PTR [rsp+8], xmm0 movsd QWORD PTR [rsp], xmm1 call fegetround mov edi, 2048 mov ebx, eax call fesetround movsd xmm1, QWORD PTR [rsp] mov edi, ebx movsd xmm0, QWORD PTR [rsp+8] addsd xmm0, xmm1 movsd QWORD PTR [rsp], xmm0 call fesetround movsd xmm0, QWORD PTR [rsp] add rsp, 16 pop rbx ret.LC2: .string "%a\n"main: sub rsp, 8 movsd xmm1, QWORD PTR .LC0[rip] movsd xmm0, QWORD PTR .LC1[rip] call addrup mov edi, OFFSET FLAT:.LC2 mov eax, 1 call printf xor eax, eax add rsp, 8 ret.LC0: .long 0 .long 988807168.LC1: .long 0 .long 1072693248clang -S -x c -march=corei7 -O3 (Godbolt GCC):addrup: # @addrup push rbx sub rsp, 16 movsd qword ptr [rsp], xmm1 # 8-byte Spill movsd qword ptr [rsp + 8], xmm0 # 8-byte Spill call fegetround mov ebx, eax mov edi, 2048 call fesetround movsd xmm0, qword ptr [rsp + 8] # 8-byte Reload addsd xmm0, qword ptr [rsp] # 8-byte Folded Reload movsd qword ptr [rsp + 8], xmm0 # 8-byte Spill mov edi, ebx call fesetround movsd xmm0, qword ptr [rsp + 8] # 8-byte Reload add rsp, 16 pop rbx ret.LCPI1_0: .quad 4607182418800017408 # double 1.LCPI1_1: .quad 4246894448610377728 # double 8.2718061255302767E-25main: # @main push rax movsd xmm0, qword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero movsd xmm1, qword ptr [rip + .LCPI1_1] # xmm1 = mem[0],zero call addrup mov edi, .L.str mov al, 1 call printf xor eax, eax pop rcx ret.L.str: .asciz "%a\n"Now for the more interesting part: why does that work?Well, when they (GCC and/or Clang) compile code, they try to find and replace values that can be computed at runtime. This is known as constant propagation. If you had simply written another function, constant propagation would cease to occur, since it isn't supposed to cross functions.However, if they see a function that they could, in theory, substitute the code of in place of the function call, they may do so. This is known as function inlining. If function inlining will work on a function, we say that that function is (surprise) inlinable.If a function always return the same results for a given set of inputs, then it is considered pure. We also say that it has no side effects (meaning it makes no changes to the environment).Now, if a function is fully inlinable (meaning that it doesn't make any calls to external libraries excluding a few defaults included in GCC and Clang - libc, libm, etc.) and is pure, then they will apply constant propagation to the function.In other words, if we don't want them to propagate constants through a function call, we can do one of two things:Make the function appear impure:Use the filesystemDo some bullshit magic with some random input from somewhereUse the networkUse some syscall of some sortCall something from an external library unknown to GCC and/or ClangMake the function not fully inlinableCall something from an external library unknown to GCC and/or ClangUse __attribute__ ((noinline))Now, that last one is the easiest. As you may have surmised, __attribute__ ((noinline)) marks the function as non-inlinable. Since we can take advantage of this, all we have to do is make another function that does whatever computation we want, mark it with __attribute__ ((noinline)), and then call it.When it is compiled, they will not violate the inlining and, by extension, constant propagation rules, and therefore, the value will be computed at runtime with the appropriate rounding mode set. 这篇关于添加两个浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-15 21:31