本文介绍了Spark 2.0:通过 GetOrCreate 重新定义 SparkSession 参数并且看不到 WebUI 中的更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将 Spark 2.0 与 PySpark 结合使用.

I'm using Spark 2.0 with PySpark.

我正在通过 2.0 中引入的 GetOrCreate 方法重新定义 SparkSession 参数:

I am redefining SparkSession parameters through a GetOrCreate method that was introduced in 2.0:

此方法首先检查是否存在有效的全局默认 SparkSession,如果存在,则返回那个.如果不存在有效的全局默认 SparkSession,该方法会创建一个新的 SparkSession 并将新创建的 SparkSession 指定为全局默认.

如果返回现有 SparkSession,则此构建器中指定的配置选项将应用于现有 SparkSession.

In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate

到目前为止一切顺利:

from pyspark import SparkConf

SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'

spark.conf.get("spark.app.name")
'pyspark-shell'

然后我重新定义了 SparkSession 配置,并承诺看到 WebUI 中的变化

Then I redefine SparkSession config with the promise to see the changes in WebUI

应用名称(名称)
设置应用程序的名称,该名称将显示在 Spark Web UI 中.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName

c = SparkConf()
(c
 .setAppName("MyApp")
 .setMaster("local")
 .set("spark.driver.memory","1g")
 )

from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())

spark.conf.get("spark.app.name")
'MyApp'

现在,当我转到 localhost:4040 时,我希望看到 MyApp 作为应用程序名称.

Now, when I go to localhost:4040, I would expect to see MyApp as an app name.

然而,我还是看到了pyspark-shell 应用程序UI

我哪里错了?

提前致谢!

推荐答案

我认为这里的文档有点误导,当您使用 Scala 时,您实际上会看到这样的警告:

I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:

... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.

在 Spark 2.0 之前,上下文之间的明确分离更加明显:

It was more obvious prior to Spark 2.0 with clear separation between contexts:

  • SparkContext 配置不能在运行时修改.您必须先停止现有上下文.
  • SQLContext 配置可以在运行时修改.
  • SparkContext configuration cannot be modified on runtime. You have to stop existing context first.
  • SQLContext configuration can be modified on runtime.

spark.app.name 与许多其他选项一样,绑定到 SparkContext,并且不能在不停止上下文的情况下进行修改.

spark.app.name, like many other options, is bound to SparkContext, and cannot be modified without stopping the context.

重用现有的SparkContext/SparkSession

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

spark.conf.get("spark.sql.shuffle.partitions")
String = 200
val conf = new SparkConf()
  .setAppName("foo")
  .set("spark.sql.shuffle.partitions", "2001")

val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession =  ...
spark.conf.get("spark.sql.shuffle.partitions")
String = 2001

虽然 spark.app.name 配置已更新:

spark.conf.get("spark.app.name")
String = foo

它不会影响SparkContext:

spark.sparkContext.appName
String = Spark shell

停止现有的 SparkContext/SparkSession

现在让我们停止会话并重复该过程:

Now let's stop the session and repeat the process:

spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()
...  WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...
spark.sparkContext.appName
String = foo

有趣的是,当我们停止会话时,我们仍然会收到关于使用现有 SparkContext 的警告,但您可以检查它实际上已停止.

Interestingly when we stop the session we still get a warning about using existing SparkContext, but you can check it is actually stopped.

这篇关于Spark 2.0:通过 GetOrCreate 重新定义 SparkSession 参数并且看不到 WebUI 中的更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 21:52