问题描述
我正在将Spark 2.0与PySpark结合使用.
I'm using Spark 2.0 with PySpark.
我正在通过2.0中引入的GetOrCreate
方法重新定义SparkSession
参数:
I am redefining SparkSession
parameters through a GetOrCreate
method that was introduced in 2.0:
如果返回现有的SparkSession,则在此构建器中指定的配置选项将应用于现有的SparkSession.
In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.
到目前为止很好:
from pyspark import SparkConf
SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'
spark.conf.get("spark.app.name")
'pyspark-shell'
然后我重新定义SparkSession
配置,并承诺会看到WebUI中的更改
Then I redefine SparkSession
config with the promise to see the changes in WebUI
c = SparkConf()
(c
.setAppName("MyApp")
.setMaster("local")
.set("spark.driver.memory","1g")
)
from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())
spark.conf.get("spark.app.name")
'MyApp'
现在,当我进入localhost:4040
时,我希望看到MyApp
作为应用名称.
Now, when I go to localhost:4040
, I would expect to see MyApp
as an app name.
但是,我仍然看到pyspark-shell application UI
我在哪里错了?
提前谢谢!
推荐答案
我相信这里的文档有点误导,当您使用Scala时,您实际上会看到这样的警告:
I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:
... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.
在Spark 2.0之前,上下文之间的清晰区分更加明显:
It was more obvious prior to Spark 2.0 with clear separation between contexts:
-
SparkContext
配置无法在运行时修改.您必须先停止现有上下文. 可以在运行时修改 -
SQLContext
配置.
SparkContext
configuration cannot be modified on runtime. You have to stop existing context first.SQLContext
configuration can be modified on runtime.
spark.app.name
绑定到SparkContext
,并且在不停止上下文的情况下无法进行修改.
spark.app.name
, like many other options, is bound to SparkContext
, and cannot be modified without stopping the context.
重复使用现有的SparkContext
/SparkSession
Reusing existing SparkContext
/ SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
spark.conf.get("spark.sql.shuffle.partitions")
String = 200
val conf = new SparkConf()
.setAppName("foo")
.set("spark.sql.shuffle.partitions", "2001")
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession = ...
spark.conf.get("spark.sql.shuffle.partitions")
String = 2001
spark.app.name
配置已更新:
spark.conf.get("spark.app.name")
String = foo
它不影响SparkContext
:
spark.sparkContext.appName
String = Spark shell
停止现有的SparkContext
/SparkSession
Stopping existing SparkContext
/ SparkSession
现在让我们停止会话并重复该过程:
Now let's stop the session and repeat the process:
spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...
spark.sparkContext.appName
String = foo
有趣的是,当我们停止会话时,我们仍然会收到有关使用现有SparkContext
的警告,但是您可以检查它是否实际上已停止.
Interestingly when we stop the session we still get a warning about using existing SparkContext
, but you can check it is actually stopped.
这篇关于Spark 2.0:通过GetOrCreate重新定义SparkSession参数,但未在WebUI中看到更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!