用spark-submit加载属性

本文介绍了用spark-submit加载属性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在提交Spark作业时加载属性配置文件，因此由于不同的环境(例如测试环境或产品环境)，我可以加载适当的配置.但是我不知道将属性文件放在哪里，这是加载属性文件的代码:

I want to load a property config file when submit a spark job, so I can load the proper config due to different environment, such as a test environment or a product environment. But I don't know where to put the properties file, here is the code loading the properties file:

object HbaseRDD {

  val QUORUM_DEFAULT = "172.16.1.10,172.16.1.11,172.16.1.12"
  val TIMEOUT_DEFAULT = "120000"

  val config = Try {
    val prop = new Properties()
    prop.load(new FileInputStream("hbase.properties"))
    (
      prop.getProperty("hbase.zookeeper.quorum", QUORUM_DEFAULT),
      prop.getProperty("timeout", TIMEOUT_DEFAULT)
      )
  }

  def getHbaseRDD(tableName: String, appName:String = "test", master:String = "spark://node0:7077") = {
    val sparkConf = new SparkConf().setAppName(appName).setMaster(master)
    val sc = new SparkContext(sparkConf)
    val conf = HBaseConfiguration.create()

    config match {
      case Success((quorum, timeout)) =>
        conf.set("hbase.zookeeper.quorum", quorum)
        conf.set("timeout", timeout)
      case Failure(ex) =>
        ex.printStackTrace()
        conf.set("hbase.zookeepr.quorum", QUORUM_DEFAULT)
        conf.set("timeout", TIMEOUT_DEFAULT)
    }
    conf.set(TableInputFormat.INPUT_TABLE, tableName)
    val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
    hbaseRDD
  }

}

问题是我将 hbase.properties 文件放在哪里，以便spark可以找到并加载它?或者如何通过 spark-submit 指定它?

The question is where I put the hbase.properties file so that spark could find and loading it? Or how to specify it via spark-submit?

推荐答案

请遵循以下示例(Spark 1.5)配置:

Please follow this example (Spark 1.5) configuration :

可以将文件放置在您提交Spark作业的工作目录下.(我们使用过)
另一种方法也保持在hdfs下.

检查运行时环境配置这些配置选项会将一个版本更改为其他版本，您可以查看相应的runtume配置文档

check Run-time Environment configurations These configuration options will change one version to another version, you can check corresponding runtume config documentation

spark-submit --verbose --class <your driver class > \
--master yarn-client \
--num-executors 12 \
--driver-memory 1G \
--executor-memory 2G \
--executor-cores 4 \
--conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+UseSerialGC -XX:+UseCompressedOops -XX:+UseCompressedStrings  -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:PermSize=256M -XX:MaxPermSize=512M" \
--conf "spark.driver.extraJavaOptions=-XX:PermSize=256M -XX:MaxPermSize=512M" \
--conf "spark.shuffle.memoryFraction=0.5" \
--conf "spark.worker.cleanup.enabled=true" \
--conf "spark.worker.cleanup.interval=3600" \
--conf "spark.shuffle.io.numConnectionsPerPeer=5" \
--conf "spark.eventlog.enabled=true" \
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
--conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*:$folder/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \

`-conf"spark.executor.extraClassPath = $ OTHER_JARS:hbase.Properties" \`

--conf "spark.yarn.executor.memoryOverhead=2048" \
--conf "spark.yarn.driver.memoryOverhead=1024" \
--conf "spark.eventLog.overwrite=true" \
--conf "spark.shuffle.consolidateFiles=true" \
--conf "spark.akka.frameSize=1024" \

`-属性文件yourconfig.conf \`

`-文件hbase.properties \`

--jars $your_JARS\

另外，看看

Also, have a look at

http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
如何加载Java属性文件并在其中使用火花?
如何传递-D参数或作业是否需要环境变量?
spark-configuration-mess-solved

这篇关于用spark-submit加载属性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！