浮点运算和机器epsilon | 浮点运算和机器epsilon

本文介绍了浮点运算和机器epsilon的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图计算

允许编译器评估 float 表达式的任何更大的精度，所以它看起来像第一个表达式计算在 long double 精度。在第二个表达式中，您再次执行将结果缩放到 float 。

在回答一些其他问题和下面的讨论：你基本上是寻找一个浮点类型的最小的非零差异。根据 FLT_EVAL_METHOD 的设置，编译器可能决定以比所涉及的类型更高的精度评估所有浮点表达式。在奔腾传统上，浮点单元的内部寄存器是80位，对于所有较小的浮点类型使用该精度很方便。所以最后你的测试取决于比较的精确度！= 。在没有明确的转换的情况下，这个比较的精确度是由你的编译器决定的，而不是由你的代码决定的。当你确定你的编译器已经设置了 FLT_EVAL_METHOD >到 2 ，所以它使用任何浮点计算的最高精度。

作为下面的讨论的结论我们有信心说在版本之前有一个与在 gcc 中实现 FLT_EVAL_METHOD = 2 4.5，至少从版本4.6开始确定。如果在表达式中使用整数常量 2 而不是浮点常量 2.0 ，则转换为<$ c在生成的程序集中省略$ c> float 。值得注意的是，从优化级别 -O1 ，在这些较老的编译器上产生了正确的结果，但是生成的程序集是相当不同的，并且只包含很少的浮点运算。
I'm trying to compute an approximation of the epsilon value for the float type (and I know it's already in the standard library).
The epsilon values on this machine are (printed with some approximation):
FLT_EPSILON = 1.192093e-07 DBL_EPSILON = 2.220446e-16 LDBL_EPSILON = 1.084202e-19
FLT_EVAL_METHOD is 2 so everything is done in long double precision, and float, double and long double are 32, 64 and 96 bit.
I tried to get an approximation of the value starting from 1 and dividing it by 2 until it becomes too small, doing all operation with the float type:
# include <stdio.h> int main(void) { float floatEps = 1; while (1 + floatEps / 2 != 1) floatEps /= 2; printf("float eps = %e\n", floatEps); }
The output is not what I was looking for:
float epsilon = 1.084202e-19
Intermediate operations are done with the greatest precision (due to the value of FLT_EVAL_METHOD), so this result seems legit.
However, this:
// 2.0 is a double literal while ((float) (1 + floatEps / 2.0) != 1) floatEps /= 2;
gives this output, which is the right one:
float epsilon = 1.192093e-07
but this one:
// no double literals while ((float) (1 + floatEps / 2) != 1) floatEps /= 2;
leads again to a wrong result, as the first one:
float epsilon = 1.084202e-19
These last two versions should be equivalent on this platform, is this a compiler bug? If not, what's happening?
Code is compiled with:
gcc -O0 -std=c99 -pedantic file.c
The gcc version is pretty old, but I'm at university and I can't update it:
$ gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.4.5 (Debian 4.4.5-8)
Current version of gcc, 4.7, behaves correctly on my home computer. There are also comments saying that different versions give different results.
After some answers and comments, that clarified what is behaving as expected and what's not, I changed the question a little to make it clearer.
解决方案
The compiler is allowed to evaluate float expressions in any bigger precision it likes, so it looks like the first expression is evaluated in long double precision. In the second expression you enforce scaling the result down to float again.
In answer to some of your additional questions and the discussion below: you are basically looking for the smallest non-zero difference with 1 of some floating point type. Depending on the setting of FLT_EVAL_METHOD a compiler may decide to evaluate all floating point expressions in a higher precision than the types involved. On a Pentium traditionally the internal registers of the floating point unit are 80 bits and it is convenient to use that precision for all the smaller floating point types. So in the end your test depends on the precision of your compare !=. In the absence of an explicit cast the precision of this comparison is determined by your compiler not by your code. With the explicit cast you scale the comparison down to the type you desire.
As you confirmed your compiler has set FLT_EVAL_METHOD to 2 so it uses the highest precision for any floating point calculation.
As a conclusion to the discussion below we are confident to say that there is a bug relating to implementation of the FLT_EVAL_METHOD=2 case in gcc prior to version 4.5 and that is fixed from of at least version 4.6. If the integer constant 2 is used in the expression instead of the floating point constant 2.0, the cast to float is omitted in the generated assembly. It is also worth noticing that from of optimization level -O1 the right results are produced on these older compilers, but the generated assembly is quite different and contains only few floating point operations.

这篇关于浮点运算和机器epsilon的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！