问题描述
我知道x87具有更高的内部精度,这可能是人们所看到的与SSE操作之间的最大差异.但是我想知道,使用x87还有其他好处吗?我有在任何项目中自动键入-mfpmath=sse
的习惯,我想知道我是否还缺少x87 FPU提供的其他功能.
I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and SSE operations. But I have to wonder, is there any other benefit to using x87? I have a habit of typing -mfpmath=sse
automatically in any project, and I wonder if I'm missing anything else that the x87 FPU offers.
推荐答案
对于手写asm,x87的某些指令在SSE指令集中不存在.
For hand-written asm, x87 has some instructions that don't exist in the SSE instruction set.
所有这些都是三角函数,例如fsin,fcos,fatan,fatan2和一些指数/对数.
Off the top of my head, it's all trigonometric stuff like fsin, fcos, fatan, fatan2 and some exponential/logarithm stuff.
对于gcc -O3 -ffast-math -mfpmath=387
,无论libm中的实现将使用哪种方式,GCC9 仍将内联sin(x)
作为fsin
指令. ( https://godbolt.org/z/Euc5gp ).
With gcc -O3 -ffast-math -mfpmath=387
, GCC9 will still actually inline sin(x)
as an fsin
instruction, regardless of what the implementation in libm would have used. (https://godbolt.org/z/Euc5gp).
在为32位x86编译时,MSVC调用__libm_sse2_sin_precise
.
MSVC calls __libm_sse2_sin_precise
when compiling for 32-bit x86.
如果您的代码大部分时间都花在做三角函数上,那么使用x87可能会导致轻微的性能提升或下降,这取决于使用SSE1/SSE2的标准数学库实现是比慢速微代码快还是慢. fsin
在任何使用的CPU上.
If your code spends most of the time doing trigonometry, you may see a slight performance gain or loss if you use x87, depending on whether your standard math-library implementation using SSE1/SSE2 is faster or slower than the slow microcode for fsin
on whatever CPU you're using.
CPU厂商不花很多精力来优化最新一代CPU中的x87指令的微代码,因为通常认为它已经过时并且很少使用. (请查看最近几代CPU中 Agner Fog的指令表中的复杂x87指令的uop计数和吞吐量:周期,而不是较旧的CPU). CPU越新,x87的可能性就比许多SSE或AVX指令来计算log,exp,pow或trig函数的速度要慢.
CPU vendors don't put a lot of effort into optimizing the microcode for x87 instructions in the newest generations of CPUs because it's generally considered obsolete and rarely used. (Look at uop counts and throughput for complex x87 instructions in Agner Fog's instruction tables in recent generations of CPUs: more cycles than in older CPUs). The newer the CPU, the more likely x87 will be slower than many SSE or AVX instructions to compute log, exp, pow, or trig functions.
即使x87可用,并不是所有的数学库都选择使用fsin
之类的复杂指令来实现sin()
之类的功能,或者特别是exp/log,其中用于操作基于日志的FP位模式的整数技巧很有用.
Even when x87 is available, not all math libraries choose to use complex instructions like fsin
for implementing functions like sin()
, or especially exp/log where integer tricks for manipulating the log-based FP bit-patterns are useful.
某些DSP算法使用了很多触发功能,但通常会通过使用SIMD数学库进行自动矢量化来受益于 lot .
Some DSP algorithms use a lot of trig, but typically benefit a lot from auto-vectorization with SIMD math libraries.
但是,对于花费大部分时间进行加法,乘法等运算的数学代码,SSE通常更快.
However, for math-code where you spend most of your time doing additions, multiplications etc. SSE is usually faster.
与此相关:英特尔低估了1.3亿个错误界限-fsin
的最坏情况(非常接近 pi 的fsin
输入的灾难性取消)非常糟糕.软件可以做得更好,但只能使用慢速扩展精度技术.
Also related: Intel Underestimates Error Bounds by 1.3 quintillion - the worst case for fsin
(catastrophic cancellation for fsin
inputs very near pi) is very bad. Software can do better but only with slow extended-precision techniques.
这篇关于x87比SSE的优势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!