问题描述
在帮助您发现和理解以Python方式优化嵌套for循环中的以下数组操作的方法方面,我将不胜感激.
I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:
def _func(a, b, radius):
"Return 0 if a>b, otherwise return 1"
if distance.euclidean(a, b) < radius:
return 1
else:
return 0
def _make_mask(volume, roi, radius):
mask = numpy.zeros(volume.shape)
for x in range(volume.shape[0]):
for y in range(volume.shape[1]):
for z in range(volume.shape[2]):
mask[x, y, z] = _func((x, y, z), roi, radius)
return mask
其中volume.shape
(182、218、200)和roi.shape
(3,)都是ndarray
类型;而radius
是int
Where volume.shape
(182, 218, 200) and roi.shape
(3,) are both ndarray
types; and radius
is an int
推荐答案
方法1
这是一种矢量化方法-
m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
mask = X**2 + Y**2 + Z**2 < radius**2
可能的改进:我们可以使用numexpr
模块-
Possible improvement : We can probably speedup the last step with numexpr
module -
import numexpr as ne
mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
方法2
我们还可以逐步构建与形状参数相对应的三个范围,并即时对roi
的三个元素进行减法运算,而无需像先前使用np.mgrid
那样实际创建网格.出于效率目的,使用 broadcasting
将会受益.实现看起来像这样-
We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi
on the fly without actually creating the meshes as done earlier with np.mgrid
. This would be benefited by the use of broadcasting
for efficiency purposes. The implementation would look like this -
m,n,r = volume.shape
vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
mask = vals < radius**2
简化版:感谢@Bi Rico在这里提出改进建议,因为我们可以使用np.ogrid
以更简洁的方式执行这些操作,就像这样-
Simplified version : Thanks to @Bi Rico for suggesting an improvement here as we can use np.ogrid
to perform those operations in a bit more concise manner, like so -
m,n,r = volume.shape
x,y,z = np.ogrid[0:m,0:n,0:r]-roi
mask = (x**2+y**2+z**2) < radius**2
运行时测试
函数定义-
def vectorized_app1(volume, roi, radius):
m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return X**2 + Y**2 + Z**2 < radius**2
def vectorized_app1_improved(volume, roi, radius):
m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
def vectorized_app2(volume, roi, radius):
m,n,r = volume.shape
vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
return vals < radius**2
def vectorized_app2_simplified(volume, roi, radius):
m,n,r = volume.shape
x,y,z = np.ogrid[0:m,0:n,0:r]-roi
return (x**2+y**2+z**2) < radius**2
时间-
In [106]: # Setup input arrays
...: volume = np.random.rand(90,110,100) # Half of original input sizes
...: roi = np.random.rand(3)
...: radius = 3.4
...:
In [107]: %timeit _make_mask(volume, roi, radius)
1 loops, best of 3: 41.4 s per loop
In [108]: %timeit vectorized_app1(volume, roi, radius)
10 loops, best of 3: 62.3 ms per loop
In [109]: %timeit vectorized_app1_improved(volume, roi, radius)
10 loops, best of 3: 47 ms per loop
In [110]: %timeit vectorized_app2(volume, roi, radius)
100 loops, best of 3: 4.26 ms per loop
In [139]: %timeit vectorized_app2_simplified(volume, roi, radius)
100 loops, best of 3: 4.36 ms per loop
因此,broadcasting
一如既往地展示了其疯狂的魔力,使 10,000x
速度比原始代码快,并且比 10x
更好,比通过使用即时广播的操作!
So, as always broadcasting
showing its magic for a crazy almost 10,000x
speedup over the original code and more than 10x
better than creating meshes by using on-the-fly broadcasted operations!
这篇关于Python向量化嵌套循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!