问题描述
我将 Spark 2.0 与 PySpark 结合使用.
I'm using Spark 2.0 with PySpark.
我正在通过 2.0 中引入的 GetOrCreate
方法重新定义 SparkSession
参数:
I am redefining SparkSession
parameters through a GetOrCreate
method that was introduced in 2.0:
此方法首先检查是否存在有效的全局默认 SparkSession,如果存在,则返回那个.如果不存在有效的全局默认 SparkSession,该方法会创建一个新的 SparkSession 并将新创建的 SparkSession 指定为全局默认.
如果返回现有 SparkSession,则此构建器中指定的配置选项将应用于现有 SparkSession.
In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.
到目前为止一切顺利:
from pyspark import SparkConf
SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'
spark.conf.get("spark.app.name")
'pyspark-shell'
然后我重新定义了 SparkSession
配置,并承诺看到 WebUI 中的变化
Then I redefine SparkSession
config with the promise to see the changes in WebUI
应用名称(名称)
设置应用程序的名称,该名称将显示在 Spark Web UI 中.
c = SparkConf()
(c
.setAppName("MyApp")
.setMaster("local")
.set("spark.driver.memory","1g")
)
from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())
spark.conf.get("spark.app.name")
'MyApp'
现在,当我转到 localhost:4040
时,我希望看到 MyApp
作为应用程序名称.
Now, when I go to localhost:4040
, I would expect to see MyApp
as an app name.
然而,我还是看到了pyspark-shell 应用程序UI
我哪里错了?
提前致谢!
推荐答案
我认为这里的文档有点误导,当您使用 Scala 时,您实际上会看到这样的警告:
I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:
... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.
在 Spark 2.0 之前,上下文之间的明确分离更加明显:
It was more obvious prior to Spark 2.0 with clear separation between contexts:
SparkContext
配置不能在运行时修改.您必须先停止现有上下文.SQLContext
配置可以在运行时修改.
SparkContext
configuration cannot be modified on runtime. You have to stop existing context first.SQLContext
configuration can be modified on runtime.
spark.app.name
与许多其他选项一样,绑定到 SparkContext
,并且不能在不停止上下文的情况下进行修改.
spark.app.name
, like many other options, is bound to SparkContext
, and cannot be modified without stopping the context.
重用现有的SparkContext
/SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
spark.conf.get("spark.sql.shuffle.partitions")
String = 200
val conf = new SparkConf()
.setAppName("foo")
.set("spark.sql.shuffle.partitions", "2001")
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession = ...
spark.conf.get("spark.sql.shuffle.partitions")
String = 2001
虽然 spark.app.name
配置已更新:
spark.conf.get("spark.app.name")
String = foo
它不会影响SparkContext
:
spark.sparkContext.appName
String = Spark shell
停止现有的 SparkContext
/SparkSession
现在让我们停止会话并重复该过程:
Now let's stop the session and repeat the process:
spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...
spark.sparkContext.appName
String = foo
有趣的是,当我们停止会话时,我们仍然会收到关于使用现有 SparkContext
的警告,但您可以检查它实际上已停止.
Interestingly when we stop the session we still get a warning about using existing SparkContext
, but you can check it is actually stopped.
这篇关于Spark 2.0:通过 GetOrCreate 重新定义 SparkSession 参数并且看不到 WebUI 中的更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!