本文介绍了对于包含无效值的数据集,我应该使用浮点数NaN还是浮点数+ bool?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量的数据要处理每个数据集上的数学密集型操作。它大部分类似于图像处理。然而,由于该数据是直接从物理设备读取的,所以许多像素值可能是无效的。



这使得NaN的属性表示不是数字的值,并且在算术运算上的传播非常引人注目。然而,它似乎也需要关闭一些优化,如gcc的-ffast-math,加上我们需要跨平台。我们当前的设计使用一个简单的结构,其中包含一个浮点值和一个表示有效性的bool。



看起来 ,
其他人认为这是。有没有人基于他们对IEEE754的更亲密的体验提出建议?

解决方案

t使用NaNs。使用单独的有效位。例如。像Valid的模板。但是,如果您知道您将只能在IEEE 754-2008机器上运行,而不是在IEEE 754-1985(见下文)中运行,那么您可能会失去它。



为了性能,可能更快不使用NaNs在大多数机器,你有访问。然而,我已经参与了几个机器上的FP的硬件设计,这提高了NaN处理性能,因此有一个趋势使NaNs更快,特别是,信号NaNs应该很快比Valid更快。



详情:



不是所有的浮点格式都有NaN。并非所有系统都使用IEEE浮点。 IBM十六进制浮点仍然可以在一些机器 - 实际上是系统上找到,因为IBM现在支持更新的机器上的IEEE FP。



此外,IEEE浮点本身有兼容性问题wrt NaNs,in IEEE 754-1985。例如,请参阅维基百科:

旧的HP计算机或当前的MIPS计算机(在嵌入式系统中无处不在),您不应该依赖于NaN的固定编码,但是对于特殊的NaN应该有一个依赖于机器的#ifdef。



IEEE 754-2008标准化了NaN编码,所以这种情况越来越好。



至于性能:当执行涉及两个SNaN的计算时,许多机器基本上陷阱或者以其他方式性能上的主要障碍,和QNaNs(它们不需要陷阱,也就是说,它们可以是快速的,并且在我们说话的时候在某些机器上会变得更快)。



在旧机器上,特别是较老的Intel机器,如果你关心性能,你不想使用NaNs。例如。 说英特尔奔腾4如果你编写的代码以一个时钟周期的速率添加浮点数,然后把无穷大作为输入,性能会下降很多,一个巨大的... NANs更慢,用NANs加法需要大约930个周期... ... ... ... ... ... ... ... ... ... ... ... ... \\ n \\ n

获取图片?使用NaN比执行正常浮点运算慢几乎1000x?在这种情况下,几乎可以确保使用类似Valid的模板将更快。



但是,请参阅Pentium 4的参考?这是一个真正的老网页。



最近(2009),微软说:如果你对包含大量NaN或无穷大的double数组进行数学运算,性能损失。



如果我感到被迫,我可以去在一些机器上运行一个微基准。但你应该得到的图片。



这应该改变,因为它不是很难使QNaNs快。但它一直是一个鸡和鸡蛋的问题:硬件家伙像我一起工作说没有人使用NaNs,所以我们赢了;使他们快,而软件家伙不使用NaNs,因为他们很慢。



Heck,如果你使用gcc并想获得最佳性能,你可以打开优化,例如-ffinite-math-only ...允许对假定参数和结果不是NaN或+ -Infs的浮点运算进行优化。



顺便说一句,你可以google像我一样,NaN性能浮点和检查自己出来。



最后,我一直假设你正在使用一个模板

  template< typename T> class Valid {
...
bool valid;
T value;
...
};

我喜欢这样的模板,因为他们可以带来有效性跟踪,不仅是FP,到整数(有效)等。



但是,他们可能有一个很大的成本。操作可能不比NaN在旧机器上处理贵得多,但是数据密度可能真的很差。 sizeof(Valid)有时可能是2 * sizeof(float)。



顺便说一句,你应该考虑模板的专业化,所以Valid使用NaNs如果它们可用和快速,

 模板<> class Valid< float> {
float value;
bool is_valid(){
返回值!= my_special_NaN;
}
}

等。



无论如何,你最好有尽可能少的有效位,并将它们包装在其他地方,而不是有效的接近该值。例如。

  struct Point {float x,y,z; }; 
有效< Point> pt;

p $ p> struct Point_with_Valid_Coords {Valid< float> x,y,z; };除非您使用NaN或其他特殊编码,否则



  struct Point_with_Valid_Coords {float x,y,z; bool valid_x,valid_y,valid_z}; 

之间 - 但是你必须自己做所有的代码。



BTW,我一直假设你使用C ++。如果FORTRAN或Java ...



BOTTOM LINE:单独的有效位可能更快,更便携。



但是NaN处理正在加速,有一天很快就会很好。



顺便说一下,我的偏好:创建一个有效的模板。然后可以将其用于所有数据类型。专门为NaNs如果它有帮助。虽然我的生活正在使事情更快,恕我们通常更重要的是让代码干净。


I have a large amount of data to process with math intensive operations on each data set. Much of it is analogous to image processing. However, since this data is read directly from a physical device, many of the pixel values can be invalid.

This makes NaN's property of representing values that are not a number and spreading on arithmetic operations very compelling. However, it also seems to require turning off some optimizations such as gcc's -ffast-math, plus we need to be cross platform. Our current design uses a simple struct that contains a float value and a bool indicating validity.

While it seems NaN was designed with this use in mind,others think it is more trouble than it is worth. Does anyone have advice based on their more intimate experience with IEEE754 with performance in mind?

解决方案

BRIEF: For strictest portability, don't use NaNs. Use a separate valid bit. E.g. a template like Valid. However, if you know that you will only ever run on IEEE 754-2008 machines, and not IEEE 754-1985 (see below), then you may get away with it.

For performance, it is probably faster not to use NaNs on most of the machines that you have access to. However, I have been involved with hardware design of FP on several machines that are improving NaN handling performance, so there is a trend to make NaNs faster, and, in particular, signalling NaNs should soon be faster than Valid.

DETAIL:

Not all floating point formats have NaNs. Not all systems use IEEE floating point. IBM hex floating point can still be found on some machines - actually systems, since IBM now supports IEEE FP on more recent machines.

Furthermore, IEEE Floating Point itself had compatibility issues wrt NaNs, in IEEE 754-1985. E.g, see wikipedia http://en.wikipedia.org/wiki/NaN:

This, if your code may run on older HP machines, or current MIPS machines (which are ubiquitous in embedded systems), you should not depend on a fixed encoding of NaN, but should have a machine dependent #ifdef for your special NaNs.

IEEE 754-2008 standardizes NaN encodings, so this is getting better. It depends on your market.

As for performance: many machines essentially trap, or otherwise take a major hiccup in performance, when performing computations involving both SNaNs (which must trap) and QNaNs (which don't need to trap, i.e. which could be fast - and which are getting faster in some machines as we speak.)

I can say with confidence that on older machines, particularly older Intel machines, you did NOT want to use NaNs if you cared about performance. E.g. http://www.cygnus-software.com/papers/x86andinfinity.html says "The Intel Pentium 4 handles infinities, NANs, and denormals very badly. ... If you write code that adds floating point numbers at the rate of one per clock cycle, and then throw infinities at it as input, the performance drops. A lot. A huge amount. ... NANs are even slower. Addition with NANs takes about 930 cycles. ... Denormals are a bit trickier to measure."

Get the picture? Almost 1000x slower to use a NaN than to do a normal floating point operation? In this case it is almost guaranteed that using a template like Valid will be faster.

However, see the reference to "Pentium 4"? That's a really old web page. For years people like me have been saying "QNaNs should be faster", and it has slowly taken hold.

More recently (2009), Microsoft says http://connect.microsoft.com/VisualStudio/feedback/details/498934/big-performance-penalty-for-checking-for-nans-or-infinity "If you do math on arrays of double that contain large numbers of NaN's or Infinities, there is an order of magnitude performance penalty."

If I feel impelled, I may go and run a microbenchmark on some machines. But you should get the picture.

This should be changing because it is not that hard to make QNaNs fast. But it has always been a chicken and egg problem: hardware guys like those I work with say "Nobody uses NaNs, so we won;t make them fast", while software guys don't use NaNs because they are slow. Still, the tide is slowly changing.

Heck, if you are using gcc and want best performance, you turn on optimizations like "-ffinite-math-only ... Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs." Similar is true for most compilers.

By the way, you can google like I did, "NaN performance floating point" and check refs out yourself. And/or run your own microbenchmarks.

Finally, I have been assuming that you are using a template like

template<typename T> class Valid {
    ...
    bool valid;
    T value;
    ...
};

I like templates like this, because they can bring "validity tracking" not just to FP, but also to integer (Valid), etc.

But, they can have a big cost. The operations are probably not much more expensive than NaN handling on old machines, but the data density can be really poor. sizeof(Valid) may sometimes be 2*sizeof(float). This bad density may hurt performance much more than the operations involved.

By the way, you should consider template specialization, so that Valid uses NaNs if they arte available and fast, and a valid bit otherwise.

template <> class Valid<float> {
    float value;
    bool is_valid() {
        return value != my_special_NaN;
    }
}

etc.

Anyway, you are better off having as few valid bits as possible, and packing them elsewhere, rather than Valid right close to the value. E.g.

struct Point { float x, y, z; };
Valid<Point> pt;

is better (density wise) than

struct Point_with_Valid_Coords { Valid<float> x, y, z; };

unless you are using NaNs - or some other special encoding.

And

struct Point_with_Valid_Coords { float x, y, z; bool valid_x, valid_y, valid_z };

is in between - but then you have to do all the code yourself.

BTW, I have been assuming you are using C++. If FORTRAN or Java ...

BOTTOM LINE: separate valid bits is probably faster and more portable.

But NaN handling is speeding up, and one day soon will be good enough

By the way, my preference: create a Valid template. Then you can use it for all data types. Specialize it for NaNs if it helps. Although my life is making things faster, IMHO it is usually more important to make the code clean.

这篇关于对于包含无效值的数据集,我应该使用浮点数NaN还是浮点数+ bool?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 04:44