本文介绍了项-项共现矩阵上的k-means聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从R中的文档术语矩阵导出术语项共现矩阵K.我有兴趣对逐关键字矩阵K进行K均值聚类分析. K是8962项x 8962项.

I derive a term-term co-occurrence matrix, K from a Document-Term Matrix in R. I am interested in carrying out a K-means clustering analysis on the keyword-by-keyword matrix, K. The dimension of K is 8962 terms x 8962 terms.

我将K传递给kmeans函数,如下所示:

I pass K to the kmeans function as follows:

for(i in 1:25){
    #Run kmeans for each level of i, allowing up to 100 iterations for convergence
    kmeans<- kmeans(x=K, centers=i, iter.max=100)

    #Combine cluster number and cost together, write to df
    cost_df<- rbind(cost_df, cbind(i, kmeans$tot.withinss))

 }

我的原始文档术语矩阵是590个文档x 8962术语,并在DTM上运行了上述代码,这并没有给我带来悬念.但是,由于其大小,我确实遇到了按关键字矩阵挂起的问题.关于如何克服这一问题的任何建议将是有帮助的.

My original Document-Term matrix which was 590 documents x 8962 terms and running the above code on the DTM does not give me the hanging issue. However, I do encounter hanging with the keyword-by-keyword matrix due to its size.Any suggestions as to how to overcome this would be helpful.

推荐答案

您的矩阵很大,但非常稀疏.尝试使用稀疏矩阵.

Your matrices are large but VERY sparse. Try using a sparse matrix.

这篇关于项-项共现矩阵上的k-means聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 16:26