问题描述
我想在提交Spark作业时加载属性配置文件,因此由于不同的环境(例如测试环境或产品环境),我可以加载适当的配置.但是我不知道将属性文件放在哪里,这是加载属性文件的代码:
I want to load a property config file when submit a spark job, so I can load the proper config due to different environment, such as a test environment or a product environment. But I don't know where to put the properties file, here is the code loading the properties file:
object HbaseRDD {
val QUORUM_DEFAULT = "172.16.1.10,172.16.1.11,172.16.1.12"
val TIMEOUT_DEFAULT = "120000"
val config = Try {
val prop = new Properties()
prop.load(new FileInputStream("hbase.properties"))
(
prop.getProperty("hbase.zookeeper.quorum", QUORUM_DEFAULT),
prop.getProperty("timeout", TIMEOUT_DEFAULT)
)
}
def getHbaseRDD(tableName: String, appName:String = "test", master:String = "spark://node0:7077") = {
val sparkConf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
config match {
case Success((quorum, timeout)) =>
conf.set("hbase.zookeeper.quorum", quorum)
conf.set("timeout", timeout)
case Failure(ex) =>
ex.printStackTrace()
conf.set("hbase.zookeepr.quorum", QUORUM_DEFAULT)
conf.set("timeout", TIMEOUT_DEFAULT)
}
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
hbaseRDD
}
}
问题是我将 hbase.properties
文件放在哪里,以便spark可以找到并加载它?或者如何通过 spark-submit
指定它?
The question is where I put the hbase.properties
file so that spark could find and loading it? Or how to specify it via spark-submit
?
推荐答案
请遵循以下示例(Spark 1.5)配置:
Please follow this example (Spark 1.5) configuration :
- 可以将文件放置在您提交Spark作业的工作目录下.(我们使用过)
- 另一种方法也保持在hdfs下.
检查运行时环境配置这些配置选项会将一个版本更改为其他版本,您可以查看相应的runtume配置文档
check Run-time Environment configurations These configuration options will change one version to another version, you can check corresponding runtume config documentation
spark-submit --verbose --class <your driver class > \
--master yarn-client \
--num-executors 12 \
--driver-memory 1G \
--executor-memory 2G \
--executor-cores 4 \
--conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+UseSerialGC -XX:+UseCompressedOops -XX:+UseCompressedStrings -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:PermSize=256M -XX:MaxPermSize=512M" \
--conf "spark.driver.extraJavaOptions=-XX:PermSize=256M -XX:MaxPermSize=512M" \
--conf "spark.shuffle.memoryFraction=0.5" \
--conf "spark.worker.cleanup.enabled=true" \
--conf "spark.worker.cleanup.interval=3600" \
--conf "spark.shuffle.io.numConnectionsPerPeer=5" \
--conf "spark.eventlog.enabled=true" \
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
--conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*:$folder/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
-conf"spark.executor.extraClassPath = $ OTHER_JARS:hbase.Properties" \
--conf "spark.yarn.executor.memoryOverhead=2048" \
--conf "spark.yarn.driver.memoryOverhead=1024" \
--conf "spark.eventLog.overwrite=true" \
--conf "spark.shuffle.consolidateFiles=true" \
--conf "spark.akka.frameSize=1024" \
-属性文件yourconfig.conf \
-文件hbase.properties \
--jars $your_JARS\
另外,看看
Also, have a look at
- http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
- 如何加载Java属性文件并在其中使用火花?
- 如何传递-D参数或作业是否需要环境变量?
- spark-configuration-mess-solved
这篇关于用spark-submit加载属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!