问题描述
我正在 Amazon EMR 上启动基于 spark 的 hiveserver2,它具有额外的类路径依赖项.由于 Amazon EMR 中的此错误:
I'm launching my spark-based hiveserver2 on Amazon EMR, which has an extra classpath dependency. Due to this bug in Amazon EMR:
https:///petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/
我的类路径无法通过--driver-class-path"选项提交
My classpath cannot be submitted through "--driver-class-path" option
所以我必须修改/etc/spark/conf/spark-env.conf 以添加额外的类路径:
So I'm bounded to modify /etc/spark/conf/spark-env.conf to add the extra classpath:
# Add Hadoop libraries to Spark classpath
SPARK_CLASSPATH="${SPARK_CLASSPATH}:${HADOOP_HOME}/*:${HADOOP_HOME}/../hadoop-hdfs/*:${HADOOP_HOME}/../hadoop-mapreduce/*:${HADOOP_HOME}/../hadoop-yarn/*:/home/hadoop/git/datapassport/*"
其中/home/hadoop/git/datapassport/*"是我的类路径.
where "/home/hadoop/git/datapassport/*" is my classpath.
但是启动服务器成功后,Spark环境参数显示我的更改无效:
However after launching the server successfully, the Spark environment parameter shows that my change is ineffective:
spark.driver.extraClassPath :/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
这个配置文件过时了吗?新文件在哪里,如何解决这个问题?
Is this configuration file obsolete? Where is the new file and how to fix this problem?
推荐答案
您可以使用 --driver-classpath.
You can use the --driver-classpath.
从新的 EMR 集群在主节点上启动 spark-shell.
Start a spark-shell on the master node from a fresh EMR cluster.
spark-shell --master yarn-client
scala> sc.getConf.get("spark.driver.extraClassPath")
res0: String = /etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
使用 --bootstrap-action 将您的 JAR 文件添加到 EMR 集群.
Add your JAR files to the EMR cluster using a --bootstrap-action.
当您调用 spark-submit 时,将您的 JAR 文件添加(或附加)到您从 spark-shell 获得的 extraClassPath 的值
When you call spark-submit prepend (or append) your JAR files to the value of extraClassPath you got from spark-shell
spark-submit --master yarn-cluster --driver-classpath /home/hadoop/my-custom-jar.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
这对我使用 EMR 版本 4.1 和 4.2 有效.
This worked for me using EMR release builds 4.1 and 4.2.
不同版本的 spark.driver.extraClassPath 构建过程可能会发生变化,这可能是 SPARK_CLASSPATH 不再起作用的原因.
The process for building spark.driver.extraClassPath may change between releases, which may be the reason why SPARK_CLASSPATH doesn't work anymore.
这篇关于在 Amazon EMR 4.0.0 上,设置/etc/spark/conf/spark-env.conf 无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!