scala - 使用Spark分析Twitter数据

任何其他人都可以帮助我，无论我如何编写都可以基于``键''来分析Twitter数据。

import java.io.File
import com.google.gson.Gson
import org.apache.spark.streaming.twitter.TwitterUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}

/**
 * Collect at least the specified number of tweets into json text files.
 */
object Collect {
  private var numTweetsCollected = 0L
  private var partNum = 0
  private var gson = new Gson()

  def main(args: Array[String]) {
    // Process program arguments and set properties
    if (args.length < 3) {
      System.err.println("Usage: " + this.getClass.getSimpleName +
        "<outputDirectory> <numTweetsToCollect> <intervalInSeconds> <partitionsEachInterval>")
      System.exit(1)
    }
    val Array(outputDirectory, Utils.IntParam(numTweetsToCollect),  Utils.IntParam(intervalSecs), Utils.IntParam(partitionsEachInterval)) =
      Utils.parseCommandLineWithTwitterCredentials(args)
    val outputDir = new File(outputDirectory.toString)
    if (outputDir.exists()) {
      System.err.println("ERROR - %s already exists: delete or specify another directory".format(
        outputDirectory))
      System.exit(1)
    }
    outputDir.mkdirs()

    println("Initializing Streaming Spark Context...")
    val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
    val sc = new SparkContext(conf)
    val ssc = new StreamingContext(sc, Seconds(intervalSecs))

    val tweetStream = TwitterUtils.createStream(ssc, Utils.getAuth)
      .map(gson.toJson(_))

    tweetStream.foreachRDD((rdd, time) => {
      val count = rdd.count()
      if (count > 0) {
        val outputRDD = rdd.repartition(partitionsEachInterval)
        outputRDD.saveAsTextFile(outputDirectory + "/tweets_" + time.milliseconds.toString)
        numTweetsCollected += count
        if (numTweetsCollected > numTweetsToCollect) {
          System.exit(0)
        }
      }
    })

    ssc.start()
    ssc.awaitTermination()
  }
}

错误是

object gson is not a member of package com.google

如果您知道有关它的任何链接或解决此问题，可以与我分享，因为我想用spark分析Twitter数据。
谢谢。:)

最佳答案

就像彼得指出的那样，您缺少了gson依赖性。因此，您需要将以下依赖项添加到build.sbt中:

libraryDependencies += "com.google.code.gson" % "gson" % "2.4"

您还可以执行以下操作来按顺序定义所有依赖项:

libraryDependencies ++= Seq(
    "com.google.code.gson" % "gson" % "2.4",
    "org.apache.spark" %% "spark-core" % "1.2.0",
    "org.apache.spark" %% "spark-streaming" % "1.2.0",
    "org.apache.spark" %% "spark-streaming-twitter" % "1.2.0"
)

奖励:如果缺少其他依赖项，则可以尝试搜索对http://mvnrepository.com/的依赖项，如果需要查找给定类的关联jar /依赖项，也可以使用findjar website