本文介绍了导入scipy破坏了Python中的多处理支持的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个无法解释的奇怪问题.我希望外面有人可以帮忙!

I am running into a bizarre problem that I can't explain. I'm hoping someone out there can help please!

我正在运行Python 2.7.3和Scipy v0.14.0,并正在尝试实现一些非常简单的多处理器算法,以使用模块multiprocessing加速我的代码.我设法做了一个基本的示例工作:

I'm running Python 2.7.3 and Scipy v0.14.0 and am trying to implement some very simple multiprocessor algorithms to speeds up my code using the module multiprocessing. I've managed to make a basic example work:

import multiprocessing
import numpy as np
import time
# import scipy.special


def compute_something(t):
    a = 0.
    for i in range(100000):
        a = np.sqrt(t)
    return a

if __name__ == '__main__':

    pool_size = multiprocessing.cpu_count()
    print "Pool size:", pool_size
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = range(10)

    tic = time.time()
    builtin_outputs = map(compute_something, inputs)
    print 'Built-in:', time.time() - tic

    tic = time.time()
    pool_outputs = pool.map(compute_something, inputs)
    print 'Pool    :', time.time() - tic

运行正常,返回

Pool size: 8
Built-in: 1.56904006004
Pool    : 0.447728157043

但是如果我取消注释import scipy.special行,则会得到:

But if I uncomment the line import scipy.special, I get:

Pool size: 8
Built-in: 1.58968091011
Pool    : 1.59387993813

,我可以看到只有一个核心在我的系统上进行工作.实际上,从scipy包导入任何模块似乎都具有这种效果(我已经尝试了几次).

and I can see that only one core is doing the work on my system. In fact, importing any module from the scipy package seems to have this effect (I've tried several).

有什么想法吗?我以前从未见过这样的情况,在这种情况下,似乎无害的导入可能会产生如此奇怪和意想不到的效果.

Any ideas? I've never seen a case like this before, where an apparently innocuous import can have such a strange and unexpected effect.

谢谢!

更新(1)

将scipy导入行移至函数compute_something可以部分改善此问题:

Moving the scipy import line to the function compute_something partially improves the problem:

Pool size: 8
Built-in: 1.66807389259
Pool    : 0.596321105957

更新(2)

感谢@larsmans在不同的系统上进行测试.使用Scipy v.0.12.0尚未确认问题.将此查询移至scipy邮件列表,并将发布所有答案.

Thanks to @larsmans for testing on a different system. Problem was not confirmed using Scipy v.0.12.0. Moving this query to the scipy mailing list and will post any answers.

推荐答案

经过大量挖掘并发布问题在Scipy GitHub网站上,我找到了一个解决方案.

After much digging around and posting an issue on the Scipy GitHub site, I've found a solution.

开始之前,此处 -我只作一个概述.

Before I start, this is documented very well here - I'll just give an overview.

此问题与我使用的Scipy或Numpy版本无关.它起源于Numpy和Scipy用于各种线性代数例程的系统BLAS库.您可以通过运行来确定Numpy链接到哪些库

This problem is not related to the version of Scipy, or Numpy that I was using. It originates in the system BLAS libraries that Numpy and Scipy use for various linear algebra routines. You can tell which libraries Numpy is linked to by running

python -c 'import numpy; numpy.show_config()'

如果在Linux中使用OpenBLAS,则可能会发现CPU关联性设置为1,这意味着一旦这些算法(通过Numpy/Scipy)以Python形式导入,您最多可以访问CPU的一个核心.要对此进行测试,请在Python终端中运行

If you are using OpenBLAS in Linux, you may find that the CPU affinity is set to 1, meaning that once these algorithms are imported in Python (via Numpy/Scipy), you can access at most one core of the CPU. To test this, within a Python terminal run

import os
os.system('taskset -p %s' %os.getpid())

如果以fff的形式返回CPU亲缘关系,则可以访问多个内核.就我而言,它是这样开始的,但是在导入numpy或scipy.any_module时,它将切换到1,因此是我的问题.

If the CPU affinity is returned as f, of ff, you can access multiple cores. In my case it would start like that, but upon importing numpy or scipy.any_module, it would switch to 1, hence my problem.

我找到了两种解决方案:

I've found two solutions:

更改CPU关联性

您可以在主要功能的顶部手动设置主进程的CPU关联性,以便代码如下所示:

You can manually set the CPU affinity of the master process at the top of the main function so that the code looks like this:

import multiprocessing
import numpy as np
import math
import time
import os

def compute_something(t):
    a = 0.
    for i in range(10000000):
        a = math.sqrt(t)
    return a

if __name__ == '__main__':

    pool_size = multiprocessing.cpu_count()
    os.system('taskset -cp 0-%d %s' % (pool_size, os.getpid()))

    print "Pool size:", pool_size
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = range(10)

    tic = time.time()
    builtin_outputs = map(compute_something, inputs)
    print 'Built-in:', time.time() - tic

    tic = time.time()
    pool_outputs = pool.map(compute_something, inputs)
    print 'Pool    :', time.time() - tic

请注意,选择比taskset的内核数高的值似乎无关紧要-它仅使用最大可能数.

Note that selecting a value higher than the number of cores for taskset doesn't seem to matter - it just uses the maximum possible number.

切换BLAS库

解决方案在上面链接的网站中记录.基本上:安装libatlas并运行update-alternatives将numpy指向ATLAS,而不是OpenBLAS.

Solution documented at the site linked above. Basically: install libatlas and run update-alternatives to point numpy to ATLAS rather than OpenBLAS.

这篇关于导入scipy破坏了Python中的多处理支持的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 07:02