问题描述
MSVC多年来一直支持AVX / AVX2指令,根据,它可以自动生成说明。
但是以下两个函数都不能编译为FMA指令:
float func1(float x,float y,float z)
(或以上
{
return x * y + z;
}
float func2(float x,float y,float z)
{
return std :: fma(x,y,z)更糟糕的是,std :: fma并不是作为一个单独的FMA指令实现的,而是一个单独的FMA指令,它执行得非常糟糕,慢于一个简单的x * y + z
(如果实现不依赖于FMA指令,std :: fma的性能会很差)。
我使用
/ arch:AVX2 / O2 / Qvec
标志编译。
也尝试了/ fp:fast
,没有成功。
MSVC被迫自动发出FMA指令?
UPDATE
a href =https://msdn.microsoft.com/en-us/library/4f994tzs.aspx>
#pragma fp_contract(on | off)
解决方案我解决了这个长期存在的问题。
结果是,标志
/ fp:fast
,/ arch:AVX2
code> / O1/ O1
)不足以让Visual Studio 2015发出FMA指令。您还需要 打开/ GL
。
然后Visual Studio 2015将生成一个FMA指令
vfmadd213ss
forfloat func1(float x,float y,float z)
{
return x * y + z;
}
c> std :: fma ,我打开了一个。他们确认了
std :: fma
不会编译到FMA指令的行为,因为编译器不会将其视为内在。根据他们的反应,它将在未来的更新中得到修正,以获得最好的代码。MSVC supports AVX/AVX2 instructions for years now and according to this msdn blog post, it can automatically generate fused-multiply-add (FMA) instructions.
Yet neither of the following functions compile to FMA instruction:
float func1(float x, float y, float z) { return x * y + z; } float func2(float x, float y, float z) { return std::fma(x,y,z); }
Even worse, std::fma is not implemented as a single FMA instruction, it performs terribly, much slower than a plain
x * y + z
(the poor performance of std::fma is expected if the implementation doesn't rely on FMA instruction).I compile with
/arch:AVX2 /O2 /Qvec
flags.Also tried it with/fp:fast
, no success.So the question is how can MSVC forced to automatically emit FMA instructions?
UPDATE
There is a
#pragma fp_contract (on|off)
, which (looks like) does nothing.解决方案I solved this long-standing problem.
As it turns out, flags
/fp:fast
,/arch:AVX2
and/O1
(or above/O1
) are not enough for Visual Studio 2015 to emit FMA instructions. You also need the "Whole Program Optimization" turned on with flag/GL
.Then Visual Studio 2015 will generate an FMA instruction
vfmadd213ss
forfloat func1(float x, float y, float z) { return x * y + z; }
Regarding
std::fma
, I opened a bug at Microsoft Connect. They confirmed the behavior thatstd::fma
doesn't compile to FMA instructions, because the compiler doesn't treat it as an intrinsic. According to their response it will be fixed in a future update to get the best codegen possible.这篇关于在MSVC中自动生成FMA指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!