任何其他人都可以帮助我,无论我如何编写都可以基于``键''来分析Twitter数据。
import java.io.File
import com.google.gson.Gson
import org.apache.spark.streaming.twitter.TwitterUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
/**
* Collect at least the specified number of tweets into json text files.
*/
object Collect {
private var numTweetsCollected = 0L
private var partNum = 0
private var gson = new Gson()
def main(args: Array[String]) {
// Process program arguments and set properties
if (args.length < 3) {
System.err.println("Usage: " + this.getClass.getSimpleName +
"<outputDirectory> <numTweetsToCollect> <intervalInSeconds> <partitionsEachInterval>")
System.exit(1)
}
val Array(outputDirectory, Utils.IntParam(numTweetsToCollect), Utils.IntParam(intervalSecs), Utils.IntParam(partitionsEachInterval)) =
Utils.parseCommandLineWithTwitterCredentials(args)
val outputDir = new File(outputDirectory.toString)
if (outputDir.exists()) {
System.err.println("ERROR - %s already exists: delete or specify another directory".format(
outputDirectory))
System.exit(1)
}
outputDir.mkdirs()
println("Initializing Streaming Spark Context...")
val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(intervalSecs))
val tweetStream = TwitterUtils.createStream(ssc, Utils.getAuth)
.map(gson.toJson(_))
tweetStream.foreachRDD((rdd, time) => {
val count = rdd.count()
if (count > 0) {
val outputRDD = rdd.repartition(partitionsEachInterval)
outputRDD.saveAsTextFile(outputDirectory + "/tweets_" + time.milliseconds.toString)
numTweetsCollected += count
if (numTweetsCollected > numTweetsToCollect) {
System.exit(0)
}
}
})
ssc.start()
ssc.awaitTermination()
}
}
错误是
object gson is not a member of package com.google
如果您知道有关它的任何链接或解决此问题,可以与我分享,因为我想用spark分析Twitter数据。
谢谢。:)
最佳答案
就像彼得指出的那样,您缺少了gson依赖性。因此,您需要将以下依赖项添加到build.sbt中:
libraryDependencies += "com.google.code.gson" % "gson" % "2.4"
您还可以执行以下操作来按顺序定义所有依赖项:
libraryDependencies ++= Seq(
"com.google.code.gson" % "gson" % "2.4",
"org.apache.spark" %% "spark-core" % "1.2.0",
"org.apache.spark" %% "spark-streaming" % "1.2.0",
"org.apache.spark" %% "spark-streaming-twitter" % "1.2.0"
)
奖励:如果缺少其他依赖项,则可以尝试搜索对http://mvnrepository.com/的依赖项,如果需要查找给定类的关联jar /依赖项,也可以使用findjar website