我有一个3dN的3d坐标数组,我想有效地计算所有条目的距离矩阵。
有没有一种有效的循环策略,而不是可以应用的嵌套循环?
当前的伪代码实现:
for i,coord in enumerate(coords):
for j,coords2 in enumerate(coords):
if i != j:
dist[i,j] = numpy.norm(coord - coord2)
最佳答案
要完全重现您的结果,请执行以下操作:
>>> import scipy.spatial as sp
>>> import numpy as np
>>> a=np.random.rand(5,3) #Note this is the transpose of your array.
>>> a
array([[ 0.83921304, 0.72659404, 0.50434178], #0
[ 0.99883826, 0.91739731, 0.9435401 ], #1
[ 0.94327962, 0.57665875, 0.85853404], #2
[ 0.30053567, 0.44458829, 0.35677649], #3
[ 0.01345765, 0.49247883, 0.11496977]]) #4
>>> sp.distance.cdist(a,a)
array([[ 0. , 0.50475862, 0.39845025, 0.62568048, 0.94249268],
[ 0.50475862, 0. , 0.35554966, 1.02735895, 1.35575051],
[ 0.39845025, 0.35554966, 0. , 0.82602847, 1.1935422 ],
[ 0.62568048, 1.02735895, 0.82602847, 0. , 0.3783884 ],
[ 0.94249268, 1.35575051, 1.1935422 , 0.3783884 , 0. ]])
为了更有效地执行此操作而不重复计算,而仅计算唯一对:
>>> sp.distance.pdist(a)
array([ 0.50475862, 0.39845025, 0.62568048, 0.94249268, 0.35554966,
1.02735895, 1.35575051, 0.82602847, 1.1935422 , 0.3783884 ])
#pairs: [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
# (2, 4), (3, 4)]
注意两个数组之间的关系。
cdist
数组可以通过以下方式重现:>>> out=np.zeros((a.shape[0],a.shape[0]))
>>> dists=sp.distance.pdist(a)
>>> out[np.triu_indices(a.shape[0],1)]=dists
>>> out+=out.T
>>> out
array([[ 0. , 0.50475862, 0.39845025, 0.62568048, 0.94249268],
[ 0.50475862, 0. , 0.35554966, 1.02735895, 1.35575051],
[ 0.39845025, 0.35554966, 0. , 0.82602847, 1.1935422 ],
[ 0.62568048, 1.02735895, 0.82602847, 0. , 0.3783884 ],
[ 0.94249268, 1.35575051, 1.1935422 , 0.3783884 , 0. ]])
一些令人惊讶的时机-
设置:
def pdist_toarray(a):
out=np.zeros((a.shape[0],a.shape[0]))
dists=sp.distance.pdist(a)
out[np.triu_indices(a.shape[0],1)]=dists
return out+out.T
def looping(a):
out=np.zeros((a.shape[0],a.shape[0]))
for i in xrange(a.shape[0]):
for j in xrange(a.shape[0]):
out[i,j]=np.linalg.norm(a[i]-a[j])
return out
时间:
arr=np.random.rand(1000,3)
%timeit sp.distance.pdist(arr)
100 loops, best of 3: 4.26 ms per loop
%timeit sp.distance.cdist(arr,arr)
100 loops, best of 3: 9.31 ms per loop
%timeit pdist_toarray(arr)
10 loops, best of 3: 66.2 ms per loop
%timeit looping(arr)
1 loops, best of 3: 16.7 s per loop
因此,如果您想返回平方数组,并且只希望两对使用
cdist
,则应该使用pdist
。与cdist
相比,对于具有1000个元素的数组,循环的速度要慢约4000倍,对于具有10个元素的数组,循环的速度要慢约70倍。关于python - 脾气暴躁的人,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/18537878/