OpenBLAS设置最大线程数

OpenBLAS设置最大线程数

本文介绍了numpy OpenBLAS设置最大线程数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用numpy,并且我的模型涉及密集的矩阵矩阵乘法.为了加快速度,我使用OpenBLAS多线程库来并行化numpy.dot函数.

I am using numpy and my model involves intensive matrix-matrix multiplication.To speed up, I use OpenBLAS multi-threaded library to parallelize the numpy.dot function.

我的设置如下,

  • OS:CentOS 6.2服务器#CPUs = 12,#MEM = 96GB
  • python版本:Python2.7.6
  • numpy:numpy 1.8.0
  • OpenBLAS + IntelMKL

$ OMP_NUM_THREADS=8 python test_mul.py

代码,我从 https://gist.github.com/osdf/中获取了

test_mul.py:

import numpy
import sys
import timeit

try:
    import numpy.core._dotblas
    print 'FAST BLAS'
except ImportError:
    print 'slow blas'

print "version:", numpy.__version__
print "maxint:", sys.maxint
print

x = numpy.random.random((1000,1000))

setup = "import numpy; x = numpy.random.random((1000,1000))"
count = 5

t = timeit.Timer("numpy.dot(x, x.T)", setup=setup)
print "dot:", t.timeit(count)/count, "sec"

当我使用OMP_NUM_THREADS = 1 python test_mul.py时,结果为

when I use OMP_NUM_THREADS=1 python test_mul.py, the result is

dot: 0.200172233582 sec

OMP_NUM_THREADS = 2

OMP_NUM_THREADS=2

dot: 0.103047609329 sec

OMP_NUM_THREADS = 4

OMP_NUM_THREADS=4

dot: 0.0533880233765 sec

一切顺利.

但是,当我设置OMP_NUM_THREADS=8 ....时,代码开始偶尔起作用".

However, when I set OMP_NUM_THREADS=8.... the code starts to "occasionally works".

有时它可以工作,有时甚至不能运行,并且给了我核心转储.

sometimes it works, sometimes it does not even run and and gives me core dumps.

OMP_NUM_THREADS > 10时.该代码似乎一直在中断.我想知道这里正在发生什么?每个进程都可以使用诸如MAXIMUM个数字线程之类的东西吗?考虑到我的计算机中有12个CPU,我可以提高该限制吗?

when OMP_NUM_THREADS > 10. the code seems to break all the time..I am wondering what is happening here ? Is there something like a MAXIMUM number threads that each process can use ? Can I raise that limit, given that I have 12 CPUs in my machine ?

谢谢

推荐答案

首先,我不太了解"OpenBLAS + IntelMKL"的含义.这两个都是BLAS库,并且numpy仅应在运行时链接到其中一个.您可能应该检查这两个numpy中的哪一个实际在使用.您可以通过以下方式进行此操作:

Firstly, I don't really understand what you mean by 'OpenBLAS + IntelMKL'. Both of those are BLAS libraries, and numpy should only link to one of them at runtime. You should probably check which of these two numpy is actually using. You can do this by calling:

$ ldd <path-to-site-packages>/numpy/core/_dotblas.so

更新: numpy/core/_dotblas.so已在numpy v1.10 中删除,但您可以改为检查numpy/core/multiarray.so的链接.

Update: numpy/core/_dotblas.so was removed in numpy v1.10, but you can check the linkage of numpy/core/multiarray.so instead.

例如,我链接到OpenBLAS:

For example, I link against OpenBLAS:

...
libopenblas.so.0 => /opt/OpenBLAS/lib/libopenblas.so.0 (0x00007f788c934000)
...

如果您确实要链接到OpenBLAS,是从源代码构建的吗?如果这样做了,您应该会在Makefile.rule中看到一个带注释的选项:

If you are indeed linking against OpenBLAS, did you build it from source? If you did, you should see that in the Makefile.rule there is a commented option:

...
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 24
...

默认情况下,OpenBLAS将尝试设置要自动使用的最大线程数,但是如果未正确检测到该行,则可以尝试取消注释和自己编辑该行.

By default OpenBLAS will try to set the maximum number of threads to use automatically, but you could try uncommenting and editing this line yourself if it is not detecting this correctly.

此外,请记住,使用更多线程可能会降低性能回报.除非您的阵列很大,否则使用6个以上的线程不太可能会大大提高性能,因为线程创建和管理涉及的开销会增加.

Also, bear in mind that you will probably see diminishing returns in terms of performance from using more threads. Unless your arrays are very large it is unlikely that using more than 6 threads will give much of a performance boost because of the increased overhead involved in thread creation and management.

这篇关于numpy OpenBLAS设置最大线程数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 17:45