_mm_max_ss在clang和gcc之间具有不同的行为 | ss在clang和gcc之间具有不同的行为

本文介绍了_mm_max_ss在clang和gcc之间具有不同的行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用clang和gcc交叉编译项目，但是在使用 _mm_max_ss 例如

I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss e.g.

__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);

现在，当涉及到NaN但clang和gcc给出不同的结果时，我期望的是 std :: max 类型的行为:

Now I expected std::max type behavior when NaNs are involved but clang and gcc give different results:

Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000

Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000

_mm_max_ps在我使用它时会做预期的事情.我尝试使用 -ffast-math ， -fno-fast-math ，但似乎没有效果.有什么想法可以使编译器之间的行为相似?

_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math, -fno-fast-math but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?

Godbolt链接此处

Godbolt link here

推荐答案

我的理解是IEEE-754要求:(NaN cmp x)为所有 cmp 运算符 {==，< ;,< =，> ;,> =} ，除了 {！=}返回 true . max() 函数的实现可以根据任何不等式运算符来定义.

My understanding is that IEEE-754 requires: (NaN cmp x) to return false for all cmp operators {==, <, <=, >, >=}, except {!=} which returns true. An implementation of a max() function might be defined in terms of any of the inequality operators.

因此，问题是，如何实现 _mm_max_ps ?使用 {<，< =，>，> =} 还是进行一点比较?

So, the question is, how is _mm_max_ps implemented? With {<, <=, >, >=}, or a bit comparison?

有趣的是，当您在链接中禁用优化时，gcc和clang都使用了相应的 maxss 指令.两者都产生:

Interestingly, when disabling optimization in your link, the corresponding maxss instruction is used by both gcc and clang. Both yield:

2.000000 0.000000 0.000000 0.000000
nan 0.000000 0.000000 0.000000

鉴于以下情况，这表明: max(NaN，2.0f)->2.0f ，即: max(a，b)=(a op b)吗?a:b ，其中 op 是以下之一: {< ;、< =，> ;、> =} .使用IEEE-754规则，此比较的结果始终为false，因此:

This suggests, given: max(NaN, 2.0f) -> 2.0f, that: max(a, b) = (a op b) ? a : b, where op is one of: {<, <=, >, >=}. With IEEE-754 rules, the result of this comparison is always false, so:

(NaN op val)总是 false，返回(val)，
(val op NaN)总是 false，返回(NaN)

(NaN op val) is always false, returning (val),
(val op NaN) is always false, returning (NaN)

启用优化后，编译器可以在编译时自由地预先计算(c)和(d).似乎clang按照 maxss 指令的方式评估结果-纠正按需"行为.GCC要么放弃使用 max() 的另一种实现方式-它使用GMP和MPFR库作为编译时数值-或者只是对粗心_mm_max_ss 语义.

With optimization on, the compiler is free to precompute (c) and (d) at compile time. It appears that clang evaluates the results as the maxss instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max() - it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss semantics.

GCC在Godbolt上的10.2和主干版本仍然存在问题.因此，我认为您已经找到了一个错误！我没有回答第二部分，因为我想不出能有效解决此问题的通用黑客工具.

GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.

根据Intel的ISA参考:

From Intel's ISA reference:

如果此指令的NaN(SNaN或QNaN)只有一个值，则第二个源操作数，即NaN或有效的浮点值，写入结果.如果不是这种行为，则是必需的从任一源操作数返回的NaN，可以使用一系列指令来模拟MAXSS，例如比较，然后是AND，ANDN和OR.

If only one value is a NaN (SNaN or QNaN) for this instruction, thesecond source operand, either a NaN or a valid floating-point value,is written to the result. If instead of this behavior, it is requiredthat the NaN from either source operand be returned, the action ofMAXSS can be emulated using a sequence of instructions, such as, acomparison followed by AND, ANDN and OR.

这篇关于_mm_max_ss在clang和gcc之间具有不同的行为的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..