问题描述
我有一个距离矩阵 n*n M
其中 M_ij
是 object_i
和 object_j
之间的距离.正如预期的那样,它采用以下形式:
I have a distance matrix n*n M
where M_ij
is the distance between object_i
and object_j
. So as expected, it takes the following form:
/ 0 M_01 M_02 ... M_0n
| M_10 0 M_12 ... M_1n |
| M_20 M_21 0 ... M2_n |
| ... |
M_n0 M_n2 M_n2 ... 0 /
现在我希望用层次聚类来聚类这 n 个对象.Python 有一个名为 scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
的实现.
Now I wish to cluster these n objects with hierarchical clustering. Python has an implementation of this called scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
.
其文档 说:
y 必须是一个 {n choose 2} 大小的向量,其中 n 是原始观测值在距离矩阵中配对.
y : ndarray
y : ndarray
一个压缩或冗余的距离矩阵.一个浓缩距离矩阵是一个平面数组,包含距离矩阵.这是 pdist 返回的形式.或者,一个n 维中 m 个观察向量的集合可以作为一个 m × n 数组.
A condensed or redundant distance matrix. A condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of m observation vectors in n dimensions may be passed as an m by n array.
我对 y
的描述感到困惑.我可以直接输入我的M
作为输入y
吗?
I am confused by this description of y
. Can I directly feed my M
in as the input y
?
更新
@hongbo-zhu-cn 在 GitHub 上提出了这个问题.这正是我所关心的.但是,作为 GitHub 的新手,我不知道它是如何工作的,因此不知道如何处理这个问题.
@hongbo-zhu-cn has raised this issue up in GitHub. This is exactly what I am concerning about. However, as a newbie to GitHub, I don't know how it works and therefore have no idea how this issue is dealt with.
推荐答案
看来确实我们不能直接将冗余方阵传入,尽管文档声称我们可以这样做.
It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so.
为了让以后遇到同样问题的人受益,我在这里写下我的解决方案作为附加答案.因此,复制和粘贴人员可以继续进行聚类.
To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering.
使用以下代码片段压缩矩阵并愉快地继续.
Use the following snippet to condense the matrix and happily proceed.
import scipy.spatial.distance as ssd
# convert the redundant n*n square matrix form into a condensed nC2 array
distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j
如果我错了,请纠正我.
Please correct me if I am wrong.
这篇关于在 scipy.cluster.hierarchy.linkage() 中使用距离矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!