使用类sklearn.cluster.SpectralClustering和参数affinity ='precomputed'

本文介绍了使用类sklearn.cluster.SpectralClustering和参数affinity ='precomputed'的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在理解 sklearn.cluster.SpectralClustering 类的特定用例时遇到麻烦，如官方文档。假设我想使用自己的亲和力矩阵执行聚类。我首先实例化类 SpectralClustering 的对象，如下所示：

I'm having trouble understanding a specific use case of the sklearn.cluster.SpectralClustering class as outlined in the official documentation here. Say I want to use my own affinity matrix to perform clustering. I first instantiate an object of class SpectralClustering as follows:

from sklearn.clustering import SpectralClustering

cl = SpectralClustering(n_clusters=5,affinity='precomputed')

上面的 affinity 参数的文档如下：

The documentation for the affinity parameter above is as follows:

如果为字符串，则可能是 nearest_neighbors， precomputed之一， rbf或sklearn.metrics.pairwise_kernels支持的内核之一。
仅应使用产生相似性得分（随相似性增加的非负值）的内核。群集算法不会检查此属性。

If a string, this may be one of ‘nearest_neighbors’, ‘precomputed’, ‘rbf’ or one of the kernels supported by sklearn.metrics.pairwise_kernels. Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.

现在，对象 cl 具有方法 fit ，有关其唯一参数 X 的文档如下：

Now the object cl has a method fit for which the documentation about its sole parameter X is as follows:

或，如果亲和力= = 预先计算的，形状为（n_samples，n_samples）的预先计算的亲和力矩阵

OR, if affinity==precomputed, a precomputed affinity matrix of shape (n_samples, n_samples)

这就是令人困惑的地方。我正在使用自己的亲和力矩阵，其中的0表示两个点是相同的，数字越大表示两个点之间的相异性越高。但是，参数 affinity 的其他选择实际上是采用一个数据集并生成一个相似度矩阵，为此 higher 值表示相似度更高，而较低值指示相异性（例如径向基核）。

This is where it gets confusing. I am using my own affinity matrix, where a measure of 0 means two points are identical, with a higher number meaning two points are more dissimilar. However, the other choices for the parameter affinity actually take a data set and produce a similarity matrix, for which higher values are indicative of more similarity, and lower values indicate dissimilarity (such as the radial basis kernel).

因此，当使用<$ c我的 SpectralClustering 实例上的$ c> fit 方法，在将其传递给 fit 方法调用为参数 X 吗？同一文档页面上有关于将距离转换为行为相似的注释，但没有明确指出应在何处执行此步骤以及通过哪种方法调用。

So when using the fit method on my instance of SpectralClustering, do I actually need to transform my affinity matrix into a similarity matrix before passing it to the fit method call as the parameter X? The same documentation page makes a note on transforming distance to well-behaved similarities, but does not explicitly indicate where this step should should be carried out, and via which method call.

parameter

使用类sklearn.cluster.SpectralClustering和参数affinity ='precomputed'

问题描述

推荐答案