本文介绍了使用类sklearn.cluster.SpectralClustering和参数affinity ='precomputed'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解 sklearn.cluster.SpectralClustering 类的特定用例时遇到麻烦,如官方文档。假设我想使用自己的亲和力矩阵执行聚类。我首先实例化类 SpectralClustering 的对象,如下所示:

I'm having trouble understanding a specific use case of the sklearn.cluster.SpectralClustering class as outlined in the official documentation here. Say I want to use my own affinity matrix to perform clustering. I first instantiate an object of class SpectralClustering as follows:

from sklearn.clustering import SpectralClustering

cl = SpectralClustering(n_clusters=5,affinity='precomputed')

上面的 affinity 参数的文档如下:

The documentation for the affinity parameter above is as follows:

如果为字符串,则可能是 nearest_neighbors, precomputed之一, rbf或sklearn.metrics.pairwise_kernels支持的内核之一。
仅应使用产生相似性得分(随相似性增加的非负值)的内核。群集算法不会检查此属性。

If a string, this may be one of ‘nearest_neighbors’, ‘precomputed’, ‘rbf’ or one of the kernels supported by sklearn.metrics.pairwise_kernels. Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.

现在,对象 cl 具有方法 fit ,有关其唯一参数 X 的文档如下:

Now the object cl has a method fit for which the documentation about its sole parameter X is as follows:

或,如果亲和力= = 预先计算的,形状为(n_samples,n_samples)的预先计算的亲和力矩阵

OR, if affinity==precomputed, a precomputed affinity matrix of shape (n_samples, n_samples)

这就是令人困惑的地方。我正在使用自己的亲和力矩阵,其中的0表示两个点是相同的,数字越大表示两个点之间的相异性越高。但是,参数 affinity 的其他选择实际上是采用一个数据集并生成一个相似度矩阵,为此 higher 值表示相似度更高,而较低值指示相异性(例如径向基核)。

This is where it gets confusing. I am using my own affinity matrix, where a measure of 0 means two points are identical, with a higher number meaning two points are more dissimilar. However, the other choices for the parameter affinity actually take a data set and produce a similarity matrix, for which higher values are indicative of more similarity, and lower values indicate dissimilarity (such as the radial basis kernel).

因此,当使用<$ c我的 SpectralClustering 实例上的$ c> fit 方法,在将其传递给 fit 方法调用为参数 X 吗?同一文档页面上有关于将距离转换为行为相似的注释,但没有明确指出应在何处执行此步骤以及通过哪种方法调用。

So when using the fit method on my instance of SpectralClustering, do I actually need to transform my affinity matrix into a similarity matrix before passing it to the fit method call as the parameter X? The same documentation page makes a note on transforming distance to well-behaved similarities, but does not explicitly indicate where this step should should be carried out, and via which method call.

推荐答案

直接从文档中查找:



np.exp(- X ** 2 / (2. * delta ** 2))

这在您自己的代码中进行,其结果可以传递给 fit 。就此算法而言,亲和度表示相似度,而不是距离。

This goes in your own code, and the result of this can be passed to fit. For the purpose of this algorithm, affinity means similarity, not distance.

这篇关于使用类sklearn.cluster.SpectralClustering和参数affinity ='precomputed'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 10:20
查看更多