问题描述
可能是一个平常的问题,但是如何在Python中并行化此循环?
Probably a commonplace question, but how can I parallelize this loop in Python?
for i in range(0,Nx.shape[2]):
for j in range(0,Nx.shape[2]):
NI=Nx[:,:,i]; NJ=Nx[:,:,j]
Ku[i,j] = (NI[mask!=True]*NJ[mask!=True]).sum()
所以我的问题是:并行化此代码的最简单方法是什么?
So my question: what's the easiest way to parallelize this code?
---------- EDIT LATER------------------
数据示例
import random
import numpy as np
import numpy.ma as ma
from numpy import unravel_index
#my input
Nx = np.random.rand(5,5,5)
#mask creation
mask_positions = zip(*np.where((Nx[:,:,0] < 0.4)))
mask_array_positions = np.asarray(mask_positions)
i, j = mask_array_positions.T
mask = np.zeros(Nx[:,:,0].shape, bool)
mask[i,j] = True
我想通过并行计算Ku.我的目的是使用Ku数组解决线性问题,因此我必须将遮罩值分开(代表数组的一半)
And i want to calculate Ku by parallelizing. My aim is to use the Ku array to solve a linear problem so i have to put the mask values apart (represent near the half of my array)
推荐答案
我认为您想使用numpy
术语进行向量化",而不是以多进程方式并行化.
I think you want to 'vectorize', to use numpy
terminology, not parallelize in the multiprocess way.
您的计算实质上是一个点(矩阵)乘积.将mask
应用于整个数组一次,以获得2d数组NIJ
.其形状将为(N,5)
,其中N
是~mask
中True
值的数量.然后,它只是一个(5,N)
数组,该数组用(N,5)
点缀"了-即.在N
维上求和,剩下一个(5,5)
数组.
Your calculation is essentially a dot (matrix) product. Apply the mask
once to the whole array to get a 2d array, NIJ
. Its shape will be (N,5)
, where N
is the number of True
values in ~mask
. Then it's just a (5,N)
array 'dotted' with a (N,5)
- ie. sum over the N
dimension, leaving you with a (5,5)
array.
NIJ = Nx[~mask,:]
Ku = np.dot(NIJ.T,NIJ)
在快速测试中,它与双循环产生的Ku
相匹配.根据用于np.dot
的基础库,可能会进行一些多核计算,但这通常不是numpy
用户的优先事项.
In quick tests it matches the Ku
produced by your double loop. Depending on the underlying library used for np.dot
there might be some multicore calculation, but that's usually not a priority issue for numpy
users.
应用大的布尔值mask
是这些计算中最耗时的部分-矢量化版本和迭代版本.
Applying the large boolean mask
is the most time consuming part of these calculations - both the vectorized and iterative versions.
对于具有400,000个True值的mask
,请比较以下两个索引时间:
For a mask
with 400,000 True values, compare these 2 indexing times:
In [195]: timeit (NI[:400,:1000],NJ[:400,:1000])
100000 loops, best of 3: 4.87 us per loop
In [196]: timeit (NI[mask],NJ[mask])
10 loops, best of 3: 98.8 ms per loop
通过基本(切片)索引选择相同数量的项目比使用mask
进行高级索引要快几个数量级.
Selecting the same number of items with basic (slice) indexing is several orders of magnitude faster than advanced indexing with the mask
.
将np.dot(NI[mask],NJ[mask])
替换为(NI[mask]*NJ[mask]).sum()
仅节省了几毫秒.
Substituting np.dot(NI[mask],NJ[mask])
for (NI[mask]*NJ[mask]).sum()
only saves a few ms.
这篇关于Python-为2D蒙版数组并行化python循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!