本文介绍了Spark 2.0:通过GetOrCreate重新定义SparkSession参数,但未在WebUI中看到更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Spark 2.0与PySpark结合使用.

I'm using Spark 2.0 with PySpark.

我正在通过2.0中引入的GetOrCreate方法重新定义SparkSession参数:

I am redefining SparkSession parameters through a GetOrCreate method that was introduced in 2.0:

如果返回现有的SparkSession,则在此构建器中指定的配置选项将应用于现有的SparkSession.

In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate

到目前为止很好:

from pyspark import SparkConf

SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'

spark.conf.get("spark.app.name")
'pyspark-shell'

然后我重新定义SparkSession配置,并承诺会看到WebUI中的更改

Then I redefine SparkSession config with the promise to see the changes in WebUI

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName

c = SparkConf()
(c
 .setAppName("MyApp")
 .setMaster("local")
 .set("spark.driver.memory","1g")
 )

from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())

spark.conf.get("spark.app.name")
'MyApp'

现在,当我进入localhost:4040时,我希望看到MyApp作为应用名称.

Now, when I go to localhost:4040, I would expect to see MyApp as an app name.

但是,我仍然看到pyspark-shell application UI

我在哪里错了?

提前谢谢!

推荐答案

我相信这里的文档有点误导,当您使用Scala时,您实际上会看到这样的警告:

I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:

... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.

在Spark 2.0之前,上下文之间的清晰区分更加明显:

It was more obvious prior to Spark 2.0 with clear separation between contexts:

  • SparkContext配置无法在运行时修改.您必须先停止现有上下文.
  • 可以在运行时修改
  • SQLContext配置.
  • SparkContext configuration cannot be modified on runtime. You have to stop existing context first.
  • SQLContext configuration can be modified on runtime.

spark.app.name绑定到SparkContext,并且在不停止上下文的情况下无法进行修改.

spark.app.name, like many other options, is bound to SparkContext, and cannot be modified without stopping the context.

重复使用现有的SparkContext/SparkSession

Reusing existing SparkContext / SparkSession

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

spark.conf.get("spark.sql.shuffle.partitions")
String = 200
val conf = new SparkConf()
  .setAppName("foo")
  .set("spark.sql.shuffle.partitions", "2001")

val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession =  ...
spark.conf.get("spark.sql.shuffle.partitions")
String = 2001

spark.app.name配置已更新:

spark.conf.get("spark.app.name")
String = foo

它不影响SparkContext:

spark.sparkContext.appName
String = Spark shell

停止现有的SparkContext/SparkSession

Stopping existing SparkContext / SparkSession

现在让我们停止会话并重复该过程:

Now let's stop the session and repeat the process:

spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()
...  WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...
spark.sparkContext.appName
String = foo

有趣的是,当我们停止会话时,我们仍然会收到有关使用现有SparkContext的警告,但是您可以检查它是否实际上已停止.

Interestingly when we stop the session we still get a warning about using existing SparkContext, but you can check it is actually stopped.

这篇关于Spark 2.0:通过GetOrCreate重新定义SparkSession参数,但未在WebUI中看到更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 21:52