如何慢是NaN的算术英特尔的x64 FPU？

本文介绍了如何慢是NaN的算术英特尔的x64 FPU？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

提示和指控比比皆是算术NaN的可以在硬件的FPU慢。特别是在现代的x64 FPU，例如在Nehalem处理器i7处理器，是仍然如此？不要FPU乘法得到搅动了在不考虑操作数的值相同的速度？

Hints and allegations abound that arithmetic with NaNs can be 'slow' in hardware FPUs. Specifically in the modern x64 FPU, e.g on a Nehalem i7, is that still true? Do FPU multiplies get churned out at the same speed regardless of the values of the operands?

我有一些插值code，可以神游物外的我们定义的数据边上，我想，以确定是否它的速度更快，检查的NaN（或其他一些标记值）在这里四处奔走，或者只是在方便点。

I have some interpolation code that can wander off the edge of our defined data, and I'm trying to determine whether it's faster to check for NaNs (or some other sentinel value) here there and everywhere, or just at convenient points.

是的，我会的基准我的具体情况（它可以由别的东西完全一样，内存带宽为主），但我很惊讶，不希望看到的简要说明的地方，以帮助我的直觉。

Yes, I will benchmark my particular case (it could be dominated by something else entirely, like memory bandwidth), but I was surprised not to see a concise summary somewhere to help with my intuition.

我会从CLR这样做，如果有差别，以NaN的味道产生。

I'll be doing this from the CLR, if it makes a difference as to the flavor of NaNs generated.

推荐答案

有关它的价值，使用SSE指令 mulsd 与 NaN的是pretty的多少完全一样快，随着不断的 4.0 （由一个公平的骰子，保证是随机选择）。

For what it's worth, using the SSE instruction mulsd with NaN is pretty much exactly as fast as with the constant 4.0 (chosen by a fair dice roll, guaranteed to be random).

这code：

for (unsigned i = 0; i < 2000000000; i++)
{
    double j = doubleValue * i;
}

产生本机code（内循环）铿锵（我假设.NET虚拟机使用SSE指令时，它也可以做到）：

generates this machine code (inside the loop) with clang (I assume the .NET virtual machine uses SSE instructions when it can too):

movsd     -16(%rbp), %xmm0    ; gets the constant (NaN or 4.0) into xmm0
movl      -20(%rbp), %eax     ; puts i into a register
cvtsi2sdq %rax, %xmm1         ; converts i to a double and puts it in xmm1
mulsd     %xmm0, %xmm1        ; multiplies xmm0 (the constant) with xmm1 (i)
movsd     %xmm1, -32(%rbp)    ; puts the result somewhere on the stack

和两个十亿次迭代，在 NaN的（由C宏定义 NAN 从＆LT;文件math.h＆GT; ）的版本花了大约0.017的少的秒在我的i7处理器执行。所不同的可能是由任务调度引起的。

And with two billion iterations, the NaN (as defined by the C macro NAN from <math.h>) version took about 0.017 less seconds to execute on my i7. The difference was probably caused by the task scheduler.

因此，为了公平起见，他们是完全一样快。

So to be fair, they're exactly as fast.

这篇关于如何慢是NaN的算术英特尔的x64 FPU？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..

FAST