问题描述
我从R中的文档术语矩阵导出术语项共现矩阵K.我有兴趣对逐关键字矩阵K进行K均值聚类分析. K是8962项x 8962项.
I derive a term-term co-occurrence matrix, K from a Document-Term Matrix in R. I am interested in carrying out a K-means clustering analysis on the keyword-by-keyword matrix, K. The dimension of K is 8962 terms x 8962 terms.
我将K传递给kmeans函数,如下所示:
I pass K to the kmeans function as follows:
for(i in 1:25){
#Run kmeans for each level of i, allowing up to 100 iterations for convergence
kmeans<- kmeans(x=K, centers=i, iter.max=100)
#Combine cluster number and cost together, write to df
cost_df<- rbind(cost_df, cbind(i, kmeans$tot.withinss))
}
我的原始文档术语矩阵是590个文档x 8962术语,并在DTM上运行了上述代码,这并没有给我带来悬念.但是,由于其大小,我确实遇到了按关键字矩阵挂起的问题.关于如何克服这一问题的任何建议将是有帮助的.
My original Document-Term matrix which was 590 documents x 8962 terms and running the above code on the DTM does not give me the hanging issue. However, I do encounter hanging with the keyword-by-keyword matrix due to its size.Any suggestions as to how to overcome this would be helpful.
推荐答案
您的矩阵很大,但非常稀疏.尝试使用稀疏矩阵.
Your matrices are large but VERY sparse. Try using a sparse matrix.
这篇关于项-项共现矩阵上的k-means聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!