压缩距离矩阵如何工作?(pdist) | 压缩距离矩阵如何工作

本文介绍了压缩距离矩阵如何工作?(pdist)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

scipy.spatial.distance.pdist 返回一个压缩的距离矩阵.来自文档:

返回一个压缩的距离矩阵 Y.对于每个和(其中 )，度量 dist(u=X[i], v=X[j]) 被计算并存储在条目 ij 中.

我认为 ij 的意思是 i*j.但我想我可能是错的.考虑

X = array([[1,2], [1,2], [3,4]])dist_matrix = pdist(X)

然后文档说dist(X[0], X[2]) 应该是dist_matrix[0*2].然而， dist_matrix[0*2] 是 0 —— 不是 2.8 应该是.

给定 i 和 j，我应该使用什么公式来访问两个向量的相似度?

解决方案

你可以这样看:假设 x 是 m × n.m 行的可能对，一次选择两个，是 itertools.combinations(range(m), 2)，例如，对于 m=3:

>>>导入迭代工具>>>列表(组合(范围(3)，2))[(0, 1), (0, 2), (1, 2)]

所以如果 d = pdist(x)，combinations(range(m), 2)) 中的 kth 元组给出与 d[k] 关联的 x 行的索引.

示例:

>>>x = 数组([[0,10],[10,10],[20,20]])>>>pdist(x)数组([ 10. , 22.36067977, 14.14213562])

第一个元素是 dist(x[0], x[1])，第二个元素是 dist(x[0], x[2]) 和第三个是dist(x[1], x[2]).

或者您可以将其视为平方距离矩阵的上三角部分中的元素，串在一起形成一维数组.

例如

>>>方形(pdist(x))数组([[ 0. , 10. , 22.361],[10., 0., 14.142],[ 22.361, 14.142, 0. ]])>>>y = 数组([[0,10],[10,10],[20,20],[10,0]])>>>方形(pdist(y))数组([[ 0. , 10. , 22.361, 14.142],[ 10. , 0. , 14.142, 10. ],[ 22.361, 14.142, 0., 22.361],[ 14.142, 10. , 22.361, 0. ]])>>>pdist(y)数组([ 10., 22.361, 14.142, 14.142, 10., 22.361])

scipy.spatial.distance.pdist returns a condensed distance matrix. From the documentation:

I thought ij meant i*j. But I think I might be wrong. Consider

X = array([[1,2], [1,2], [3,4]])
dist_matrix = pdist(X)

then the documentation says that dist(X[0], X[2]) should be dist_matrix[0*2]. However, dist_matrix[0*2] is 0 -- not 2.8 as it should be.

What's the formula I should use to access the similarity of a two vectors, given i and j?

解决方案

You can look at it this way: Suppose x is m by n. The possible pairs of m rows, chosen two at a time, is itertools.combinations(range(m), 2), e.g, for m=3:

>>> import itertools
>>> list(combinations(range(3),2))
[(0, 1), (0, 2), (1, 2)]

So if d = pdist(x), the kth tuple in combinations(range(m), 2)) gives the indices of the rows of x associated with d[k].

Example:

>>> x = array([[0,10],[10,10],[20,20]])
>>> pdist(x)
array([ 10.        ,  22.36067977,  14.14213562])

The first element is dist(x[0], x[1]), the second is dist(x[0], x[2]) and the third is dist(x[1], x[2]).

Or you can view it as the elements in the upper triangular part of the square distance matrix, strung together into a 1D array.

E.g.

>>> squareform(pdist(x))
array([[  0.   ,  10.   ,  22.361],
       [ 10.   ,   0.   ,  14.142],
       [ 22.361,  14.142,   0.   ]])

>>> y = array([[0,10],[10,10],[20,20],[10,0]])
>>> squareform(pdist(y))
array([[  0.   ,  10.   ,  22.361,  14.142],
       [ 10.   ,   0.   ,  14.142,  10.   ],
       [ 22.361,  14.142,   0.   ,  22.361],
       [ 14.142,  10.   ,  22.361,   0.   ]])
>>> pdist(y)
array([ 10.   ,  22.361,  14.142,  14.142,  10.   ,  22.361])

这篇关于压缩距离矩阵如何工作?(pdist)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..