问题描述
现在在 Windows 上使用最新版本的 Tensor Flow,我试图让一切尽可能高效地工作.但是,即使从源代码编译,我似乎仍然无法弄清楚如何启用 SSE 和 AVX 指令.
With the latest version of Tensor Flow now on windows, I am trying to get everything working as efficiently as possible. However, even when compiling from source, I still can't seem to figure out how to enable the SSE and AVX instructions.
默认流程:https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake没有提到如何做到这一点.
The default process:https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmakehas no mention of how to do this.
我发现的唯一参考是使用 Google 的 Bazel:如何使用 SSE4.2 和 AVX 指令编译 Tensorflow?
The only reference I have found has been using Google's Bazel:How to compile Tensorflow with SSE4.2 and AVX instructions?
有谁知道使用 MSBuild 打开这些高级指令的简单方法?我听说他们的速度至少提高了 3 倍.
Does anyone know of an easy way to turn on these advanced instructions using MSBuild? I hear they give at least a 3X speed up.
为了帮助那些寻找类似解决方案的人,这是我目前收到的警告,如下所示:https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake
To help those looking for a similar solution, this is the warning I am currently getting looks like this:https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake
我在 64 位平台上使用 Windows 10 专业版、Visual Studio 2015 社区版、Anaconda Python 3.6 和 cmake 3.6.3 版(更高版本不适用于 Tensor Flow)
I am using Windows 10 Professional on a 64 bit platform, Visual Studio 2015 Community Edition, Anaconda Python 3.6 with cmake version 3.6.3 (later versions don't work for Tensor Flow)
推荐答案
嗯,我试图解决这个问题,但我不确定它是否真的有效.
Well, I tried to fix that, but I am not sure if it really worked.
在CMakeLists.txt
中你会发现以下语句:
In CMakeLists.txt
you will find the following statements:
if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
include(CheckCXXCompilerFlag)
CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
在 MSVC 平台上,测试失败,因为 MSVC 不支持 -march=native
标志.我修改了如下语句:
On MSVC platform, the test failes because MSVC doesn't support -march=native
flag. I modified the statements like below:
if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
include(CheckCXXCompilerFlag)
CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
else()
CHECK_CXX_COMPILER_FLAG("/arch:AVX" COMPILER_OPT_ARCH_AVX_SUPPORTED)
if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
endif()
endif()
endif()
通过这样做,cmake 将检查 /arch:AVX
是否可用并使用它.根据 MSDN 和 MSDN,默认情况下为 x86 编译启用 SSE2 支持,但不适用于 x64 编译.对于 x64 编译,您可以选择使用 AVX 或 AVX2.上面我用的是AVX,因为我的CPU只支持AVX,如果你有兼容的CPU可以试试AVX2.
By doing this, cmake would check if /arch:AVX
is available and use it. Accordinf to MSDN and MSDN, SSE2 support is enabled by default for x86 compiling but not available for x64 compiling. For x64 compiling you can choose to use AVX or AVX2. I used AVX above because my CPU only supports AVX, youcan try AVX2 if you have a compatible CPU.
使用上面的CMakeLists.txt
编译,编译程序比官方发布慢很多,'AVX/AVX2'的警告消失了,但是SSE/SSE2/3/4.1/的警告4.2 仍然存在.我认为可以忽略这些警告,因为 x64 MSBuild 没有 SSE 支持.
By compiling use the above CMakeLists.txt
, the compiling preocedure was much slower than official release, and warning about 'AVX/AVX2' disappeared, but warning about SSE/SSE2/3/4.1/4.2 still exists. I think these warnings can be ignored because there's no SSE support for x64 MSBuild.
我现在正在测试新的 pip 包.它可能比以前更快,但我不想写一个新的基准......
I am testing the new pip package now. It maybe faster than before, but I don't want to write a new benchmark ...
任何对此感兴趣的人,请测试新包是否真的更快.
Any one who is interested in this, please test if the new package is really faster.
我在 2017 年 3 月 12 日的 git master 分支上完成了所有这些.pip包名显示是tensorflow 1.0.1
I did all these on the lasted git master branch, 2017-3-12. The pip package name shows that it was tensorflow 1.0.1
这篇关于如何在 Windows 上使用 SSE 和 AVX 指令编译 Tensor Flow?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!