问题描述
我正在本地计算机上运行以下代码:
I am running this code on a local machine:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "/Users/username/Spark/README.md"
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
我想运行该程序,但要在其他文件上运行-当前仅在README.md上运行.运行Spark(或与此相关的任何其他参数)时如何传递另一个文件的文件路径?例如,我想将contains("a")
更改为另一个字母.
I'd like to run the program but run it on different files - it currently only runs on README.md. How do I pass the file path of another file when running Spark (or any other argument for that matter?). For example, I'd like to change contains("a")
to another letter.
我使程序运行于:
$ YOUR_SPARK_HOME/bin/spark-submit \
--class "SimpleApp" \
--master local[4] \
target/scala-2.10/simple-project_2.10-1.0.jar
谢谢!
推荐答案
在
def main(args: Array[String]) {
您正在准备让主体接受.jar行之后的任何内容作为参数.它将为您创建一个名为"args"的数组.然后,您可以像往常一样使用args [n]访问它们.
you are preparing your main to accept anything after the .jar line as an argument. It will make an array named 'args' for you out of them. You then access them as usual with args[n].
最好检查您的参数的类型和/或格式,通常是您以外的任何人都可以运行此参数.
It might be good to check your arguments for type and/or format, it usually is if anyone other than you might run this.
因此无需设置
val logFile = "String here"
设置
val logFile = args(0)
,然后将文件作为第一个参数传递.有关更多信息,请查看spark-submit docs,但是,您基本上只需在下一行输入它即可.
and then pass the file as the first argument. Check spark-submit docs for more on that, but, you just enter it on the next line basically.
这篇关于在Apache Spark中传递参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!