一些浮点精度和数字限制问题

一些浮点精度和数字限制问题

本文介绍了一些浮点精度和数字限制问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有这么多问题,但我找不到我的答案。请在投票结束前阅读(:




  • 根据:



数字协处理器有八个浮点寄存器
每个寄存器保存80位数据
浮点数总是存储为这些寄存器中的80位
扩展精度数。

如果可能的话,

2)为什么你认为 long double 不能达到1.18973e + 4932?



3)为什么你不能存储10 ^ 308在8字节?我只需要13位:4存储10,和9存储308。


I know that there are tons of questions like this one, but I couldn't find my answers. Please read before voting to close (:

The numeric coprocessor has eight floating point registers.
Each register holds 80 bits of data.
Floating point numbers are always stored as 80-bit
extended precision numbers in these registers.

How is that possible, when sizeof shows different things. For example, on x64 architecture, the sizeof double is 8 and this is far away from 80bits.

  • why does std::numeric_limits< long double >::max() gives me 1.18973e+4932 ?! This is huuuuuuuuuuge number. If this is not the way to get max of floating point numbers, then why this compiles at all, and even more - why does this returns a value.

  • what does this mean:

Double precision magnitudes can range from approximately 10^−308 to 10^308 

These are huge numbers, you cannot store them into 8B or even 16B (which is extended precision and it is only 128bits)?

Obviously, I'm missing something. Actually, obviously, a lot of things.

解决方案

1) sizeof is the size in memory, not in a register. sizeof is in bytes, so 8 bytes = 64 bits. When doubles are calculated in memory (on this architecture), they get an extra 16 bits for more precise intermediate calculations. When the value is copied back to memory, the extra 16 bits are lost.

2) Why do you think long double doesn't go up to 1.18973e+4932?

3) Why can't you store 10^308 in 8 bytes? I only need 13 bits: 4 to store the 10, and 9 to store the 308.

这篇关于一些浮点精度和数字限制问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 00:00