本文介绍了使用 Spark KMeans 算法打印 ClusterID 及其元素.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个程序可以在 apache-spark 上打印 Kmeans 算法的 MSSE.生成了 20 个簇.我正在尝试打印 clusterID 和分配给相应 clusterID 的元素.我如何遍历 clusterID 以打印元素.
I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements.
谢谢你们!!
val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
// Load and parse the data
val data = sc.textFile("kmeans.csv")
val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))
// Cluster the data into two classes using KMeans
val numIterations = 20
val numClusters = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)
val clusterCenters = clusters.clusterCenters map (_.toArray)
println("The Cluster Centers are = " + clusterCenters)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)
推荐答案
我知道你应该为每个元素运行 predict .
as I know you should run predict for each elements.
KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
List<Vector> vectors = parsedData.collect();
for(Vector vector: vectors){
System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
}
这篇关于使用 Spark KMeans 算法打印 ClusterID 及其元素.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!