问题描述
首先,感谢您阅读并花时间回复.
First, thanks for reading and taking the time to respond.
二、问题:
我有一个 PxN 矩阵 X,其中 P 的数量级为 10^6,N 的数量级为 10^3.所以,X 比较大,并不稀疏.假设 X 的每一行都是一个 N 维样本.我想构建这些 P 个样本之间成对距离的 PxP 矩阵.假设我对海灵格距离感兴趣.
I have a PxN matrix X where P is in the order of 10^6 and N is in the order of 10^3. So, X is relatively large and is not sparse. Let's say each row of X is an N-dimensional sample. I want to construct a PxP matrix of pairwise distances between these P samples. Let's also say I am interested in Hellinger distances.
到目前为止,我依赖于稀疏 dok 矩阵:
So far I am relying on sparse dok matrices:
def hellinger_distance(X):
P = X.shape[0]
H1 = sp.sparse.dok_matrix((P, P))
for i in xrange(P):
if i%100 == 0:
print i
x1 = X[i]
X2 = X[i:P]
h = np.sqrt(((np.sqrt(x1) - np.sqrt(X2))**2).sum(1)) / math.sqrt(2)
H1[i, i:P] = h
H = H1 + H1.T
return H
这太慢了.有没有更有效的方法来做到这一点?非常感谢任何帮助.
This is super slow. Is there a more efficient way of doing this? Any help is much appreciated.
推荐答案
您可以使用 pdist
和 squareform
来自 scipy.spatial.distance
-
You can use pdist
and squareform
from scipy.spatial.distance
-
from scipy.spatial.distance import pdist, squareform
out = squareform(pdist(np.sqrt(X)))/np.sqrt(2)
或者使用 cdist
来自同一个 -
Or use cdist
from the same -
from scipy.spatial.distance import cdist
sX = np.sqrt(X)
out = cdist(sX,sX)/np.sqrt(2)
这篇关于在许多向量之间构建成对距离矩阵的有效方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!