theano张量的pdist

本文介绍了theano张量的pdist的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个theano符号矩阵

I have a theano symbolic matrix

x = T.fmatrix('input')

x稍后将由昏暗的d向量(在火车时间)填充.

x will be later on populated by n vectors of dim d (at train time).

我想使用pdist的theano等效项( pdist 的="=" noreferrer> scipy.spatial.distance.pdist ) ，类似

I would like to have the theano equivalent of pdist (scipy.spatial.distance.pdist of pdist), something like

D = theano.pdist( x )

我该如何实现?

直接调用x上的scipy.spatial.distance.pdist无效，因为在此阶段x只是象征性的...

Calling scipy.spatial.distance.pdist on x directly does not work as x at this stage is only symbolic...

更新:我非常希望能够模仿pdist紧凑"行为:也就是说，仅计算n x n项的〜1/2距离矩阵.

Update: I would very much like to be able to mimic pdist "compact" behavior: that is, computing only ~1/2 of the nxn entries of the distance matrix.

推荐答案

pdist是不同功能的集合-一次都没有与Theano等效的功能.但是，每个特定距离(作为闭合形式的数学表达式)都可以这样在Theano中记录下来，然后进行编译.

pdist from scipy is a collection of different functions - there doesn't exist a Theano equivalent for all of them at once. However, each specific distance, being a closed form mathematical expression, can be written down in Theano as such and then compiled.

以minkowski p规范距离(复制+粘贴)为例:

Take as a example the minkowski p norm distance (copy+pasteable):

import theano
import theano.tensor as T
X = T.fmatrix('X')
Y = T.fmatrix('Y')
P = T.scalar('P')
translation_vectors = X.reshape((X.shape[0], 1, -1)) - Y.reshape((1, Y.shape[0], -1))
minkowski_distances = (abs(translation_vectors) ** P).sum(2) ** (1. / P)
f_minkowski = theano.function([X, Y, P], minkowski_distances)

请注意，abs调用内置的__abs__，因此abs也是theano函数.现在，我们可以将其与pdist:

Note that abs calls the built-in __abs__, so abs is also a theano function. We can now compare this to pdist:

import numpy as np
from scipy.spatial.distance import pdist

rng = np.random.RandomState(42)
d = 20 # dimension
nX = 10
nY = 30
x = rng.randn(nX, d).astype(np.float32)
y = rng.randn(nY, d).astype(np.float32)

ps = [1., 3., 2.]

for p in ps:
    d_theano = f_minkowski(x, x, p)[np.triu_indices(nX, 1)]
    d_scipy = pdist(x, p=p, metric='minkowski')
    print "Testing p=%1.2f, discrepancy %1.3e" % (p, np.sqrt(((d_theano - d_scipy) ** 2).sum()))

这产生

Testing p=1.00, discrepancy 1.322e-06
Testing p=3.00, discrepancy 4.277e-07
Testing p=2.00, discrepancy 4.789e-07

如您所见，

对应关系在那里，但是功能f_minkowski稍微通用些，因为它比较了两个可能不同的数组的行.如果两次将同一数组作为输入传递，则f_minkowski返回一个矩阵，而pdist返回一个没有冗余的列表.如果需要这种行为，也可以完全动态地实现，但是在这里我将坚持一般的情况.

As you can see, the correspondence is there, but the function f_minkowski is slightly more general, since it compares the lines of two possibly different arrays. If twice the same array is passed as input, f_minkowski returns a matrix, whereas pdist returns a list without redundancy. If this behaviour is desired, it can also be implemented fully dynamically, but I will stick to the general case here.

但是应该注意一种专门化的可能性:在p=2的情况下，通过二项式公式可以使计算变得更简单，并且可以用来节省内存中的宝贵空间:而如上所述，一般的Minkowski距离，创建3D数组(由于避免了for循环和累加)，这取决于尺寸d(和nX, nY)，因此是禁止的，对于p=2我们可以编写

One possibility of specialization should be noted though: In the case of p=2, the calculations become simpler through the binomial formula, and this can be used to save precious space in memory: Whereas the general Minkowski distance, as implemented above, creates a 3D array (due to avoidance of for-loops and summing cumulatively), which is prohibitive, depending on the dimension d (and nX, nY), for p=2 we can write

squared_euclidean_distances = (X ** 2).sum(1).reshape((X.shape[0], 1)) + (Y ** 2).sum(1).reshape((1, Y.shape[0])) - 2 * X.dot(Y.T)
f_euclidean = theano.function([X, Y], T.sqrt(squared_euclidean_distances))

仅使用O(nX * nY)空格而不是O(nX * nY * d)我们这次检查是否存在对应问题，

which only uses O(nX * nY) space instead of O(nX * nY * d) We check for correspondence, this time on the general problem:

d_eucl = f_euclidean(x, y)
d_minkowski2 = f_minkowski(x, y, 2.)
print "Comparing f_minkowski, p=2 and f_euclidean: l2-discrepancy %1.3e" % ((d_eucl - d_minkowski2) ** 2).sum()

屈服

Comparing f_minkowski, p=2 and f_euclidean: l2-discrepancy 1.464e-11

这篇关于theano张量的pdist的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！