问题描述
我正在尝试使用 OpenBLAS
安装 numpy
,但是我不知道如何编写 site.cfg
文件.
I am trying to install numpy
with OpenBLAS
, however I am at loss as to how the site.cfg
file needs to be written.
当按照安装程序完成安装时,没有错误,但是,将 OpenBLAS 使用的线程数从 1 增加(由环境变量 OMP_NUM_THREADS 控制)会导致性能下降.
When the installation procedure was followed the installation completed without errors, however there is performance degradation on increasing the number of threads used by OpenBLAS from 1 (controlled by the environment variable OMP_NUM_THREADS).
我不确定 OpenBLAS 集成是否完美.任何人都可以提供一个 site.cfg
文件来实现相同的功能.
I am not sure if the OpenBLAS integration has been perfect. Could any one provide a site.cfg
file to achieve the same.
PS:OpenBLAS 集成在其他工具包中,例如基于 Python 的 Theano,在增加数量时提供了显着的性能提升线程数,在同一台机器上.
P.S.: OpenBLAS integration in other toolkits like Theano, which is based on Python, provides substantial performance boost on increasing the number of threads, on the same machine.
推荐答案
我刚刚在 virtualenv
中编译了 numpy
与 OpenBLAS
集成,并且它似乎工作正常.
I just compiled numpy
inside a virtualenv
with OpenBLAS
integration, and it seems to be working OK.
这是我的过程:
编译
OpenBLAS
:
$ git clone https://github.com/xianyi/OpenBLAS
$ cd OpenBLAS && make FC=gfortran
$ sudo make PREFIX=/opt/OpenBLAS install
如果您没有管理员权限,您可以将 PREFIX=
设置为您具有写入权限的目录(只需相应地修改下面的相应步骤即可).
If you don't have admin rights you could set PREFIX=
to a directory where you have write privileges (just modify the corresponding steps below accordingly).
确保包含 libopenblas.so
的目录在您的共享库搜索路径中.
Make sure that the directory containing libopenblas.so
is in your shared library search path.
要在本地执行此操作,您可以编辑
~/.bashrc
文件以包含该行
export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH
环境变量将在您启动新的终端会话时更新(使用 $ source ~/.bashrc
在同一会话中强制更新).
The LD_LIBRARY_PATH
environment variable will be updated when you start a new terminal session (use $ source ~/.bashrc
to force an update within the same session).
另一个适用于多个用户的选项是在 /etc/ld.so.conf.d/
中创建一个 .conf
文件,其中包含该行/opt/OpenBLAS/lib
,例如:
Another option that will work for multiple users is to create a .conf
file in /etc/ld.so.conf.d/
containing the line /opt/OpenBLAS/lib
, e.g.:
$ sudo sh -c "echo '/opt/OpenBLAS/lib' > /etc/ld.so.conf.d/openblas.conf"
完成任一选项后,运行
$ sudo ldconfig
获取numpy
源代码:
$ git clone https://github.com/numpy/numpy
$ cd numpy
将 site.cfg.example
复制到 site.cfg
并编辑副本:
Copy site.cfg.example
to site.cfg
and edit the copy:
$ cp site.cfg.example site.cfg
$ nano site.cfg
取消注释这些行:
....
[openblas]
libraries = openblas
library_dirs = /opt/OpenBLAS/lib
include_dirs = /opt/OpenBLAS/include
....
检查配置、构建、安装(可选在 virtualenv
中)
$ python setup.py config
输出应该是这样的:
...
openblas_info:
FOUND:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
FOUND:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
...
使用 pip
安装比使用 python setup.py install首选/code>,因为
pip
将跟踪包元数据,并允许您在将来轻松卸载或升级 numpy.
Installing with pip
is preferable to using python setup.py install
, since pip
will keep track of the package metadata and allow you to easily uninstall or upgrade numpy in the future.
$ pip install .
可选:您可以使用此脚本来测试不同线程数的性能.
Optional: you can use this script to test performance for different thread counts.
$ OMP_NUM_THREADS=1 python build/test_numpy.py
version: 1.10.0.dev0+8e026a2
maxint: 9223372036854775807
BLAS info:
* libraries ['openblas', 'openblas']
* library_dirs ['/opt/OpenBLAS/lib']
* define_macros [('HAVE_CBLAS', None)]
* language c
dot: 0.099796795845 sec
$ OMP_NUM_THREADS=8 python build/test_numpy.py
version: 1.10.0.dev0+8e026a2
maxint: 9223372036854775807
BLAS info:
* libraries ['openblas', 'openblas']
* library_dirs ['/opt/OpenBLAS/lib']
* define_macros [('HAVE_CBLAS', None)]
* language c
dot: 0.0439578056335 sec
对于更高的线程数,性能似乎有明显的提高.但是,我还没有非常系统地对此进行过测试,对于较小的矩阵,额外的开销可能会超过更高线程数带来的性能优势.
There seems to be a noticeable improvement in performance for higher thread counts. However, I haven't tested this very systematically, and it's likely that for smaller matrices the additional overhead would outweigh the performance benefit from a higher thread count.
这篇关于使用 OpenBLAS 集成编译 numpy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!