本文介绍了使用 Spark KMeans 算法打印 ClusterID 及其元素.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个程序可以在 apache-spark 上打印 Kmeans 算法的 MSSE.生成了 20 个簇.我正在尝试打印 clusterID 和分配给相应 clusterID 的元素.我如何遍历 clusterID 以打印元素.

I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements.

谢谢你们!!

           val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
            // Load and parse the data
            val data = sc.textFile("kmeans.csv")
         val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))

        // Cluster the data into two classes using KMeans
        val numIterations = 20
        val numClusters = 20
        val clusters = KMeans.train(parsedData, numClusters, numIterations)
        val clusterCenters = clusters.clusterCenters map (_.toArray)
        println("The Cluster Centers are = " + clusterCenters)
        // Evaluate clustering by computing Within Set Sum of Squared Errors
        val WSSSE = clusters.computeCost(parsedData)
        println("Within Set Sum of Squared Errors = " + WSSSE)

推荐答案

我知道你应该为每个元素运行 predict .

as I know you should run predict for each elements.

    KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

    List<Vector> vectors = parsedData.collect();
    for(Vector vector: vectors){
        System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
    }

这篇关于使用 Spark KMeans 算法打印 ClusterID 及其元素.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 03:50