问题描述
让X
为Bxn
numpy
矩阵,即
import numpy as np
B = 10
n = 2
X = np.random.random((B, n))
现在,我对计算形状为BxB
的所谓内核(甚至相似性)矩阵K
感兴趣,其第{i,j}
个元素的给出如下:
Now, I'm interested in computing the so-called kernel (or even similarity) matrix K
, which is of shape BxB
, and its {i,j}
-th element is given as follows:
K(i,j)= fun(x_i,x_j)
K(i,j) = fun(x_i, x_j)
其中,x_t
表示矩阵X
的第t
行,而fun
是x_i
,x_j
的某些函数.例如,该函数可以是所谓的RBF函数,即
where x_t
denotes the t
-th row of matrix X
and fun
is some function of x_i
, x_j
. For instance, this function could be the so-called RBF function, i.e.,
K(i,j)= exp(-| x_i-x_j | ^ 2).
K(i,j) = exp(-|x_i - x_j|^2).
为此,一个简单的方法如下:
For doing so, a naive way would be the following:
K = np.zeros((B, B))
for i in range(X.shape[0]):
x_i = X[i, :]
for j in range(X.shape[0]):
x_j = X[j, :]
K[i, j] = np.exp(-np.linalg.norm(x_i - x_j, 2) ** 2)
为了效率,我想要以向量化的方式执行上述操作.你能帮忙吗?
What I want is to do the above operation in a vectorized way, for the sake of efficiency. Could you help?
推荐答案
我不确定您是否可以仅使用numpy做到这一点.我会使用方法 cdist 来自scipy库,如下所示:
I'm not sure that you can due this using only numpy. I would use the method cdist from the scipy library, something like this:
import numpy as np
from scipy.spatial.distance import cdist
B=5
X=np.random.rand(B*B).reshape((B,B))
dist = cdist(X, X, metric='euclidean')
K = np.exp(dist)
dist
array([[ 0. , 1.2659804 , 0.98231231, 0.80089176, 1.19326493],
[ 1.2659804 , 0. , 0.72658078, 0.80618767, 0.3776364 ],
[ 0.98231231, 0.72658078, 0. , 0.70205336, 0.81352455],
[ 0.80089176, 0.80618767, 0.70205336, 0. , 0.60025858],
[ 1.19326493, 0.3776364 , 0.81352455, 0.60025858, 0. ]])
K
array([[ 1. , 3.5465681 , 2.67062441, 2.22752646, 3.29783084],
[ 3.5465681 , 1. , 2.06799756, 2.23935453, 1.45883242],
[ 2.67062441, 2.06799756, 1. , 2.01789192, 2.25584482],
[ 2.22752646, 2.23935453, 2.01789192, 1. , 1.82259002],
[ 3.29783084, 1.45883242, 2.25584482, 1.82259002, 1. ]])
希望这可以为您提供帮助.做得好
Hoping this can help you. Good work
编辑您也可以仅将numpy数组用于theano实现:
EDITYou can also use only numpy array, for a theano implementaion:
dist = (X ** 2).sum(1).reshape((X.shape[0], 1)) + (X ** 2).sum(1).reshape((1, X.shape[0])) - 2 * X.dot(X.T)
应该可以!
这篇关于在Python(NumPy)中高效计算相似度矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!