问题描述
sklearn.cluster.AgglomerativeClustering的文档提到了这一点,
The documentation for sklearn.cluster.AgglomerativeClustering mentions that,
这似乎意味着可以先计算完整的树,然后根据需要快速更新所需集群的数量,而无需重新计算树(使用缓存).
This seems to imply that it is possible to first compute the full tree, and then quickly update the number of desired clusters as necessary, without recomputing the tree (with caching).
但是,似乎没有记录此更改群集数的过程.我想这样做,但是不确定如何进行.
However this procedure for changing the number of clusters does not seem to be documented. I would like to do this but am unsure how to proceed.
更新:为明确起见,fit方法未将簇数作为输入: http://scikit-learn .org/stable/modules/generation/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit
Update: To clarify, the fit method does not take number of clusters as an input:http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit
推荐答案
使用参数memory = 'mycachedir'
设置缓存目录,然后如果使用compute_full_tree=True
设置,则使用不同的n_clusters
值重新运行fit
时,它将使用缓存的树,而不是每次都重新计算.为您提供有关如何使用sklearn的gridsearch API进行此操作的示例:
You set a cacheing directory with the paramater memory = 'mycachedir'
and then if you set compute_full_tree=True
, when you rerun fit
with different values of n_clusters
, it will used the cached tree rather than recomputing each time. To give you an example of how to do this with sklearn's gridsearch API:
from sklearn.cluster import AgglomerativeClustering
from sklearn.grid_search import GridSearchCV
ac = AgglomerativeClustering(memory='mycachedir',
compute_full_tree=True)
classifier = GridSearchCV(ac,
{n_clusters: range(2,6)},
scoring = 'adjusted_rand_score',
n_jobs=-1, verbose=2)
classifier.fit(X,y)
这篇关于sklearn聚集集群:动态更新集群数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!