问题描述
我今天正在阅读关于 研究人员发现 NVidia 的 Phys-X 库使用 x87 FP 与 SSE2.显然,这对于速度胜过精度的并行数据集来说不是最理想的.但是,文章作者继续引用:
I was reading today about researchers discovering that NVidia's Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed trumps precision. However, the article author goes on to quote:
英特尔在 2000 年底推出 P4 后开始不鼓励使用 x87.自 2003 年 K8 以来,AMD 弃用了 x87,因为 x86-64 定义为支持 SSE2;威盛的 C7 自 2005 年起就支持 SSE2.在 64 位版本的 Windows 中,x87 在用户模式下被弃用,在内核模式下完全被禁止.自 2005 年以来,业内几乎所有人都推荐 SSE 而不是 x87,并且没有理由使用 x87,除非软件必须在嵌入式 Pentium 或 486 上运行.
我想知道这个.我知道 x87 在内部使用 80 位扩展双精度来计算值,而 SSE2 没有.这对任何人都没有关系吗?这对我来说似乎很奇怪.我知道当我对平面中的点、线和多边形进行计算时,在进行减法时值可能会出乎意料地错误,并且由于缺乏精度,区域可能会塌陷并且线会相互混叠.我想,使用 80 位值与 64 位值会有所帮助.
I wondered about this. I know that x87 uses 80-bit extended doubles internally to compute values, and SSE2 doesn't. Does this not matter to anyone? It seems surprising to me. I know when I do computations on points, lines and polygons in a plane, values can be surprisingly wrong when doing subtractions, and areas can collapse and lines alias one another due to lack of precision. Using 80-bit values vs. 64-bit values could help, I would imagine.
这是不正确的吗?如果没有,如果 x87 被淘汰,我们可以使用什么来执行扩展的双 FP 操作?
Is this incorrect? If not, what can we use to perform extended double FP operations if x87 is phased out?
推荐答案
x87最大的问题基本上是所有寄存器操作都是80位完成的,而大多数时候人们只使用64位浮点数(即双精度浮动).发生的情况是,您将 64 位浮点数加载到 x87 堆栈中,然后将其转换为 80 位.您以 80 位对其进行一些操作,然后将其存储回内存,将其转换为 64 位.与仅使用 64 位完成所有操作相比,您将获得不同的结果,并且使用优化编译器可能非常难以预测一个值可能经过多少次转换,因此很难验证您是否获得了进行回归测试时正确"的答案.
The biggest problem with x87 is basically that all register operations are done in 80 bits, whereas most of the time people only use 64 bit floats (i.e. double-precision floats). What happens is, you load a 64 bit float into the x87 stack, and it gets converted to 80 bits. You do some operations on it in 80 bits, then store it back into memory, converting it into 64 bits. You will get a different result than if you had done all the operations with just 64 bits, and with an optimizing compiler it can be very unpredictable how many conversions a value might go through, so it's hard to verify that you're getting the "correct" answer when doing regression tests.
另一个问题是 x87 使用寄存器堆栈,而 SSE使用可单独访问的寄存器.使用 x87,您有一堆额外的指令来操作堆栈,我想英特尔和 AMD 宁愿让他们的处理器使用 SSE 代码快速运行,而不是试图让这些额外的堆栈操作 x87 指令快速运行.
The other problem, which only matters from the point of view of someone writing assembly (or indirectly writing assembly, in the case of someone writing a code generator for a compiler), is that the x87 uses a register stack, whereas SSE uses individually accessible registers. With x87 you have a bunch of extra instructions to manipulate the stack, and I imagine Intel and AMD would rather make their processors run fast with SSE code than trying to make those extra stack-manipulation x87 instructions run fast.
顺便说一句,如果您遇到不准确的问题,您需要查看文章每个程序员都应该知道的关于浮点运算的知识",然后可能会使用任意精度的数学库(例如 GMP)来代替.
BTW if you are having problems with inaccuracy, you will want to take a look at the article "What every programmer should know about floating-point arithmetic", and then maybe use an arbitrary precision math library (e.g. GMP) instead.
这篇关于x87 中的扩展(80 位)双浮点,而不是 SSE2 - 我们不会错过吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!