问题描述
我正在尝试使用clang和gcc交叉编译项目,但是在使用 _mm_max_ss
例如
I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss
e.g.
__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);
现在,当涉及到NaN但clang和gcc给出不同的结果时,我期望的是 std :: max
类型的行为:
Now I expected std::max
type behavior when NaNs are involved but clang and gcc give different results:
Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000
Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000
_mm_max_ps在我使用它时会做预期的事情.我尝试使用 -ffast-math
, -fno-fast-math
,但似乎没有效果.有什么想法可以使编译器之间的行为相似?
_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math
, -fno-fast-math
but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?
Godbolt链接此处
Godbolt link here
推荐答案
我的理解是IEEE-754要求:(NaN cmp x)
为所有 cmp
运算符 {==,< ;,< =,> ;,> =}
,除了 {!=}
返回 true
. max()
函数的实现可以根据任何不等式运算符来定义.
My understanding is that IEEE-754 requires: (NaN cmp x)
to return false
for all cmp
operators {==, <, <=, >, >=}
, except {!=}
which returns true
. An implementation of a max()
function might be defined in terms of any of the inequality operators.
因此,问题是,如何实现 _mm_max_ps
?使用 {<,< =,>,> =}
还是进行一点比较?
So, the question is, how is _mm_max_ps
implemented? With {<, <=, >, >=}
, or a bit comparison?
有趣的是,当您在链接中禁用优化时,gcc和clang都使用了相应的 maxss
指令.两者都产生:
Interestingly, when disabling optimization in your link, the corresponding maxss
instruction is used by both gcc and clang. Both yield:
2.000000 0.000000 0.000000 0.000000
nan 0.000000 0.000000 0.000000
鉴于以下情况,这表明: max(NaN,2.0f)->2.0f
,即: max(a,b)=(a op b)吗?a:b
,其中 op
是以下之一: {< ;、< =,> ;、> =}
.使用IEEE-754规则,此比较的结果始终为false,因此:
This suggests, given: max(NaN, 2.0f) -> 2.0f
, that: max(a, b) = (a op b) ? a : b
, where op
is one of: {<, <=, >, >=}
. With IEEE-754 rules, the result of this comparison is always false, so:
(NaN op val)
总是 false,返回(val)
,(val op NaN)
总是 false,返回(NaN)
(NaN op val)
is always false, returning (val)
,(val op NaN)
is always false, returning (NaN)
启用优化后,编译器可以在编译时自由地预先计算(c)
和(d)
.似乎clang按照 maxss
指令的方式评估结果-纠正按需"行为.GCC要么放弃使用 max()
的另一种实现方式-它使用GMP和MPFR库作为编译时数值-或者只是对粗心_mm_max_ss
语义.
With optimization on, the compiler is free to precompute (c)
and (d)
at compile time. It appears that clang evaluates the results as the maxss
instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max()
- it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss
semantics.
GCC在Godbolt上的10.2和主干版本仍然存在问题.因此,我认为您已经找到了一个错误!我没有回答第二部分,因为我想不出能有效解决此问题的通用黑客工具.
GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.
根据Intel的ISA参考:
From Intel's ISA reference:
如果此指令的NaN(SNaN或QNaN)只有一个值,则第二个源操作数,即NaN或有效的浮点值,写入结果.如果不是这种行为,则是必需的从任一源操作数返回的NaN,可以使用一系列指令来模拟MAXSS,例如比较,然后是AND,ANDN和OR.
If only one value is a NaN (SNaN or QNaN) for this instruction, thesecond source operand, either a NaN or a valid floating-point value,is written to the result. If instead of this behavior, it is requiredthat the NaN from either source operand be returned, the action ofMAXSS can be emulated using a sequence of instructions, such as, acomparison followed by AND, ANDN and OR.
这篇关于_mm_max_ss在clang和gcc之间具有不同的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!