在MSVC中自动生成FMA指令

在MSVC中自动生成FMA指令

本文介绍了在MSVC中自动生成FMA指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MSVC多年来一直支持AVX / AVX2指令,根据,它可以自动生成说明。



但是以下两个函数都不能编译为FMA指令:

  float func1(float x,float y,float z)
{
return x * y + z;
}

float func2(float x,float y,float z)
{
return std :: fma(x,y,z)更糟糕的是,std :: fma并不是作为一个单独的FMA指令实现的,而是一个单独的FMA指令,它执行得非常糟糕,慢于一个简单的 x * y + z (如果实现不依赖于FMA指令,std :: fma的性能会很差)。



我使用 / arch:AVX2 / O2 / Qvec 标志编译。
也尝试了 / fp:fast ,没有成功。



MSVC被迫自动发出FMA指令?



UPDATE



a href =https://msdn.microsoft.com/en-us/library/4f994tzs.aspx> #pragma fp_contract(on | off)

解决方案

我解决了这个长期存在的问题。



结果是,标志 / fp:fast / arch:AVX2 code> / O1 (或以上 / O1 )不足以让Visual Studio 2015发出FMA指令。您还需要 打开 / GL



然后Visual Studio 2015将生成一个FMA指令 vfmadd213ss for

  float func1(float x,float y,float z)
{
return x * y + z;
}






c> std :: fma ,我打开了一个。他们确认了 std :: fma 不会编译到FMA指令的行为,因为编译器不会将其视为内在。根据他们的反应,它将在未来的更新中得到修正,以获得最好的代码。


MSVC supports AVX/AVX2 instructions for years now and according to this msdn blog post, it can automatically generate fused-multiply-add (FMA) instructions.

Yet neither of the following functions compile to FMA instruction:

float func1(float x, float y, float z)
{
    return x * y + z;
}

float func2(float x, float y, float z)
{
     return std::fma(x,y,z);
}

Even worse, std::fma is not implemented as a single FMA instruction, it performs terribly, much slower than a plain x * y + z (the poor performance of std::fma is expected if the implementation doesn't rely on FMA instruction).

I compile with /arch:AVX2 /O2 /Qvec flags.Also tried it with /fp:fast, no success.

So the question is how can MSVC forced to automatically emit FMA instructions?

UPDATE

There is a #pragma fp_contract (on|off), which (looks like) does nothing.

解决方案

I solved this long-standing problem.

As it turns out, flags /fp:fast, /arch:AVX2 and /O1 (or above /O1) are not enough for Visual Studio 2015 to emit FMA instructions. You also need the "Whole Program Optimization" turned on with flag /GL.

Then Visual Studio 2015 will generate an FMA instruction vfmadd213ss for

float func1(float x, float y, float z)
{
    return x * y + z;
}


Regarding std::fma, I opened a bug at Microsoft Connect. They confirmed the behavior that std::fma doesn't compile to FMA instructions, because the compiler doesn't treat it as an intrinsic. According to their response it will be fixed in a future update to get the best codegen possible.

这篇关于在MSVC中自动生成FMA指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 12:12