

如果我有一个用setMaster("local")编译的Spark作业(2.2.0),如果我用spark-submit --master yarn --deploy-mode cluster发送该作业会发生什么?

If I have a Spark job (2.2.0) compiled with setMaster("local") what will happen if I send that job with spark-submit --master yarn --deploy-mode cluster ?


I tried this and it looked like the job did get packaged up and executed on the YARN cluster rather than locally.


  • why does this work? According to the docs, things that you set in SparkConf explicitly have precedence over things passed in from the command line or via spark-submit (see: https://spark.apache.org/docs/latest/configuration.html). Is this different because I'm using SparkSession.getBuilder?


is there any less obvious impact of leaving setMaster("local") in code vs. removing it? I'm wondering if what I'm seeing is something like the job running in local mode, within the cluster, rather than properly using cluster resources.



It's because submitting your application to Yarn happens before SparkConf.setMaster.

当您使用--master yarn --deploy-mode cluster时,Spark将在本地计算机上运行其主要方法,并上传jar以在Yarn上运行. Yarn将分配一个容器作为应用程序主控器,以运行您的代码的Spark驱动程序. SparkConf.setMaster("local")在Yarn容器中运行,然后创建以本地模式运行的SparkContext,并且不使用Yarn集群资源.

When you use --master yarn --deploy-mode cluster, Spark will run its main method in your local machine and upload the jar to run on Yarn. Yarn will allocate a container as the application master to run the Spark driver, a.k.a, your codes. SparkConf.setMaster("local") runs inside a Yarn container, and then it creates SparkContext running in the local mode, and doesn't use the Yarn cluster resources.

我建议您不要在代码中设置母版.只需使用命令行--masterMASTER env指定Spark master.

I recommend that not setting master in your codes. Just use the command line --master or the MASTER env to specify the Spark master.


08-20 15:21