问题描述
我正在写一个协议,该协议使用 RFC 7049 作为其二进制表示形式.该标准规定,如果协议的数值等于相应的64位数字,则该协议可以使用32位浮点数表示.转换一定不会导致精度下降.
I am writing a protocol, that uses RFC 7049 as its binary representation. The standard states, that the protocol may use 32-bit floating point representation of numbers, if their numeric value is equivalent to respective 64-bit numbers. The conversion must not lead to lose of precision.
- 哪些32位浮点数可以大于64位整数并在数值上等效于它们?
- 比较
float x; uint64_t y; (float)x == (float)y
是否足以确保值相等?这种比较会正确吗?
- What 32-bit float numbers can be bigger than 64-bit integer and numerically equivalent with them?
- Is comparing
float x; uint64_t y; (float)x == (float)y
enough for ensuring, that the values are equivalent? Will this comparison ever be true?
推荐答案
以下内容基于朱莉娅比较浮点数和整数的方法.这不需要访问80位long double
或浮点异常,并且应该在任何舍入模式下都可以工作.我相信这对于任何C float
类型(无论是否为IEEE754)都适用,并且不会引起任何未定义的行为.
The following is based on Julia's method for comparing floats and integers. This does not require access to 80-bit long double
s or floating point exceptions, and should work under any rounding mode. I believe this should work for any C float
type (IEEE754 or not), and not cause any undefined behaviour.
更新:从技术上讲,它假定二进制float
格式,并且float
指数大小足够大,可以表示2 :对于标准IEEE754 binary32(您确实可以使用它)请参阅您的问题),但不是,例如binary16.
UPDATE: technically this assumes a binary float
format, and that the float
exponent size is large enough to represent 2: this is certainly true for the standard IEEE754 binary32 (which you refer to in your question), but not, say, binary16.
#include <stdio.h>
#include <stdint.h>
int cmp_flt_uint64(float x,uint64_t y) {
return (x == (float)y) && (x != 0x1p64f) && ((uint64_t)x == y);
}
int main() {
float x = 0x1p64f;
uint64_t y = 0xffffffffffffffff;
if (cmp_flt_uint64(x,y))
printf("true\n");
else
printf("false\n");
;
}
这里的逻辑如下:
- 仅当
x
是区间[0,2 ]中的非负整数时,第一个等式才可以成立. - 第二个检查
x
(因此是(float)y
)不是2 :如果是这种情况,则y
不能由float
精确表示,并且所以比较是错误的. -
x
的任何剩余值都可以精确转换为uint64_t
,因此我们进行了比较.
- The first equality can be true only if
x
is a non-negative integer in the interval [0,2]. - The second checks that
x
(and hence(float)y
) is not 2: if this is the case, theny
cannot be represented exactly by afloat
, and so the comparison is false. - Any remaining values of
x
can be exactly converted to auint64_t
, and so we cast and compare.
这篇关于比较uint64_t和float的数值等效性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!