问题描述
我想为具有n行和d列的数据矩阵X计算RBF或高斯"内核.所得的平方核矩阵由下式给出:
I would like to compute an RBF or "Gaussian" kernel for a data matrix X with n rows and d columns. The resulting square kernel matrix is given by:
K[i,j] = var * exp(-gamma * ||X[i] - X[j]||^2)
var和gamma是标量.
在python中最快的方法是什么?
What is the fastest way to do this in python?
推荐答案
好吧,您正在 answer post中进行了大量优化.我想添加更多(主要是调整).我会以答案中的获胜者为基础,该答案似乎是基于numexpr的.
Well you are doing a lot of optimizations in your answer post. I would like to add few more (mostly tweaks). I would build upon the winner from the answer post, which seems to be numexpr based on.
首先,可以使用np.einsum优化np.sum(X ** 2, axis = -1).虽然这不是最大的开销,但是任何形式的优化都不会受到损害.因此,该总和可以表示为-
First off, np.sum(X ** 2, axis = -1) could be optimized with np.einsum. Though this part isn't the biggest overhead, but optimization of any sort won't hurt. So, that summation could be expressed as -
X_norm = np.einsum('ij,ij->i',X,X)
调整#2
第二,我们可以利用Scipy支持的blas函数,如果允许的话,可以使用单精度dtype在其双精度精度上显着提高性能.因此,可以使用 SciPy's sgemm 就像这样-
Tweak #2
Secondly, we could leverage Scipy supported blas functions and if allowed use single-precision dtype for noticeable performance improvement over its double precision one. Hence, np.dot(X, X.T) could be computed with SciPy's sgemm like so -
sgemm(alpha=1.0, a=X, b=X, trans_b=True)
关于用gamma重新排列负号的更多调整,让我们可以更多地输入sgemm.另外,我们会将gamma推入alpha术语.
Few more tweaks on rearranging the negative sign with gamma lets us feed more to sgemm. Also, we would push in gamma into the alpha term.
因此,通过这两个优化,我们将有另外两个方法的变体(如果可以这样说的话),如下所示-
Thus, with these two optimizations, we would have two more variants (if I could put it that way) of the numexpr method, listed below -
from scipy.linalg.blas import sgemm def app1(X, gamma, var): X_norm = -np.einsum('ij,ij->i',X,X) return ne.evaluate('v * exp(g * (A + B + 2 * C))', {\ 'A' : X_norm[:,None],\ 'B' : X_norm[None,:],\ 'C' : np.dot(X, X.T),\ 'g' : gamma,\ 'v' : var\ }) def app2(X, gamma, var): X_norm = -gamma*np.einsum('ij,ij->i',X,X) return ne.evaluate('v * exp(A + B + C)', {\ 'A' : X_norm[:,None],\ 'B' : X_norm[None,:],\ 'C' : sgemm(alpha=2.0*gamma, a=X, b=X, trans_b=True),\ 'g' : gamma,\ 'v' : var\ })
运行时测试
基于Numexpr的答案中的一个-
Runtime test
Numexpr based one from your answer post -
def app0(X, gamma, var): X_norm = np.sum(X ** 2, axis = -1) return ne.evaluate('v * exp(-g * (A + B - 2 * C))', { 'A' : X_norm[:,None], 'B' : X_norm[None,:], 'C' : np.dot(X, X.T), 'g' : gamma, 'v' : var })
时间和验证-
In [165]: # Setup ...: X = np.random.randn(10000, 512) ...: gamma = 0.01 ...: var = 5.0 In [166]: %timeit app0(X, gamma, var) ...: %timeit app1(X, gamma, var) ...: %timeit app2(X, gamma, var) 1 loop, best of 3: 1.25 s per loop 1 loop, best of 3: 1.24 s per loop 1 loop, best of 3: 973 ms per loop In [167]: np.allclose(app0(X, gamma, var), app1(X, gamma, var)) Out[167]: True In [168]: np.allclose(app0(X, gamma, var), app2(X, gamma, var)) Out[168]: True
这篇关于在python中计算RBF内核最快的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!