问题描述
如果我有一个用setMaster("local")
编译的Spark作业(2.2.0),如果我用spark-submit --master yarn --deploy-mode cluster
发送该作业会发生什么?
If I have a Spark job (2.2.0) compiled with setMaster("local")
what will happen if I send that job with spark-submit --master yarn --deploy-mode cluster
?
我尝试了一下,看起来工作确实打包并在YARN群集上执行,而不是在本地执行.
I tried this and it looked like the job did get packaged up and executed on the YARN cluster rather than locally.
我不清楚的地方:
-
这为什么起作用?根据文档,在
SparkConf
中设置的内容优先于从命令行或通过spark-submit
传递的内容(请参阅: https://spark.apache.org/docs/latest/configuration.html ).这是因为我正在使用SparkSession.getBuilder
一样吗?
why does this work? According to the docs, things that you set in
SparkConf
explicitly have precedence over things passed in from the command line or viaspark-submit
(see: https://spark.apache.org/docs/latest/configuration.html). Is this different because I'm usingSparkSession.getBuilder
?
将setMaster("local")
保留在代码中与删除它相比,有没有那么明显的影响?我想知道我所看到的是否是在群集内以本地模式运行的作业,而不是正确使用群集资源.
is there any less obvious impact of leaving setMaster("local")
in code vs. removing it? I'm wondering if what I'm seeing is something like the job running in local mode, within the cluster, rather than properly using cluster resources.
推荐答案
这是因为在SparkConf.setMaster
之前将您的应用程序提交到Yarn.
It's because submitting your application to Yarn happens before SparkConf.setMaster
.
当您使用--master yarn --deploy-mode cluster
时,Spark将在本地计算机上运行其主要方法,并上传jar以在Yarn上运行. Yarn将分配一个容器作为应用程序主控器,以运行您的代码的Spark驱动程序. SparkConf.setMaster("local")
在Yarn容器中运行,然后创建以本地模式运行的SparkContext,并且不使用Yarn集群资源.
When you use --master yarn --deploy-mode cluster
, Spark will run its main method in your local machine and upload the jar to run on Yarn. Yarn will allocate a container as the application master to run the Spark driver, a.k.a, your codes. SparkConf.setMaster("local")
runs inside a Yarn container, and then it creates SparkContext running in the local mode, and doesn't use the Yarn cluster resources.
我建议您不要在代码中设置母版.只需使用命令行--master
或MASTER
env指定Spark master.
I recommend that not setting master in your codes. Just use the command line --master
or the MASTER
env to specify the Spark master.
这篇关于具有显式setMaster("local")的Spark作业,已通过YARN传递给spark-submit的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!