Spark 2.0:通过GetOrCreate重新定义SparkSession参数，但未在WebUI中看到更改

本文介绍了Spark 2.0:通过GetOrCreate重新定义SparkSession参数，但未在WebUI中看到更改的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在将Spark 2.0与PySpark结合使用.

I'm using Spark 2.0 with PySpark.

我正在通过2.0中引入的GetOrCreate方法重新定义SparkSession参数:

I am redefining SparkSession parameters through a GetOrCreate method that was introduced in 2.0:

如果返回现有的SparkSession，则在此构建器中指定的配置选项将应用于现有的SparkSession.

In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate

到目前为止很好:

from pyspark import SparkConf

SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'

spark.conf.get("spark.app.name")
'pyspark-shell'

然后我重新定义SparkSession配置，并承诺会看到WebUI中的更改

Then I redefine SparkSession config with the promise to see the changes in WebUI

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName

c = SparkConf()
(c
 .setAppName("MyApp")
 .setMaster("local")
 .set("spark.driver.memory","1g")
 )

from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())

spark.conf.get("spark.app.name")
'MyApp'

现在，当我进入localhost:4040时，我希望看到MyApp作为应用名称.

Now, when I go to localhost:4040, I would expect to see MyApp as an app name.

但是，我仍然看到pyspark-shell application UI

我在哪里错了?

提前谢谢！

推荐答案

我相信这里的文档有点误导，当您使用Scala时，您实际上会看到这样的警告:

I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:

... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.

在Spark 2.0之前，上下文之间的清晰区分更加明显:

It was more obvious prior to Spark 2.0 with clear separation between contexts:

SparkContext配置无法在运行时修改.您必须先停止现有上下文.
SQLContext配置.

SparkContext configuration cannot be modified on runtime. You have to stop existing context first.
SQLContext configuration can be modified on runtime.

spark.app.name绑定到SparkContext，并且在不停止上下文的情况下无法进行修改.

spark.app.name, like many other options, is bound to SparkContext, and cannot be modified without stopping the context.

重复使用现有的SparkContext/SparkSession

Reusing existing SparkContext / SparkSession

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

spark.conf.get("spark.sql.shuffle.partitions")

String = 200

val conf = new SparkConf()
  .setAppName("foo")
  .set("spark.sql.shuffle.partitions", "2001")

val spark = SparkSession.builder.config(conf).getOrCreate()

... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession =  ...

spark.conf.get("spark.sql.shuffle.partitions")

String = 2001

spark.app.name配置已更新:

spark.conf.get("spark.app.name")

String = foo

它不影响SparkContext:

spark.sparkContext.appName

String = Spark shell

停止现有的SparkContext/SparkSession

Stopping existing SparkContext / SparkSession

现在让我们停止会话并重复该过程:

Now let's stop the session and repeat the process:

spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()

...  WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...

spark.sparkContext.appName

String = foo

有趣的是，当我们停止会话时，我们仍然会收到有关使用现有SparkContext的警告，但是您可以检查它是否实际上已停止.

Interestingly when we stop the session we still get a warning about using existing SparkContext, but you can check it is actually stopped.

这篇关于Spark 2.0:通过GetOrCreate重新定义SparkSession参数，但未在WebUI中看到更改的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！