如何创建一个控制台多个SparkContexts

如何创建一个控制台多个SparkContexts

本文介绍了如何创建一个控制台多个SparkContexts的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个控制台多个SparkContext。据邮件列表,我需要做的SparkConf.set('spark.driver.allowMultipleContexts',真),这似乎是合理的,但不能正常工作。任何人都可以有这样的经历?非常感谢:

波纹管是我做的和错误信息,我做了一个IPython的笔记本电脑:

 从pyspark进口SparkConf,SparkContext
CONF = SparkConf()setMaster(火花://10.21.208.21:7077)。.SET(spark.driver.allowMultipleContexts,真)
conf.getAll()
[(u'spark.eventLog.enabled',u'​​true'),
 (u'spark.driver.allowMultipleContexts',u'true'),
 (u'spark.driver.host',u'10.20.70.80'),
 (u'spark.app.name',u'pyspark壳'),
 (u'spark.eventLog.dir',u'hdfs://10.21.208.21:8020 / sparklog'),
 (u'spark.master',u'spark://10.21.208.21:7077')]SC1 = SparkContext(CONF = conf.setAppName(应用1))##这个SC成功
SC1
< pyspark.context.SparkContext在0x1b7cf10>SC2 = SparkContext(CONF = conf.setAppName(应用2))##这个失败
ValueError错误回溯(最新最后调用)
< IPython的输入-23-e6dcca5aec38>上述<模块>()
----> 1 SC2 = SparkContext(CONF = conf.setAppName(应用2))/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc在__init __(自走,主,的appName,sparkHome,pyFiles,环境,BATCHSIZE,串行,CONF,网关,JSC)
    100
    101 self._callsite = first_spark_call()或调用点(无,无,无)
- > 102 SparkContext._ensure_initialized(个体经营,网关=网关)
    103试:
    104 self._do_init(主,的appName,sparkHome,pyFiles,环境,BATCHSIZE,串行,/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc在_ensure_initialized(CLS,例如,网关)
    226被%s%s的创建:%s的
    227%(currentAppName,currentMaster,
- > 228 callsite.function,callsite.file,callsite.linenum))
    229其他:
    230 SparkContext._active_spark_context =实例ValueError错误:不能同时运行多个SparkContexts;现有SparkContext(应用程序= 1,硕士=火花://10.21.208.21:7077)由​​__init__在创建< IPython的输入-21-fb3adb569241>:1


解决方案

这是添加了 spark.driver.allowMultipleContexts 配置之前存在一个特定的PySpark-限制(涉及到多个SparkContext对象JVM内)。 PySpark不允许多个活动SparkContexts,因为其实施的各个部分假设某些成分具有全局共享的状态。

I want to create more than one SparkContext in a console. According to a post in mailing list, I need to do SparkConf.set( 'spark.driver.allowMultipleContexts' , true), it seems reasonable but can not work. Can anyone have experience in this? thanks a lot:

bellow is that I do and the error message, I made that in a Ipython Notebook:

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://10.21.208.21:7077").set("spark.driver.allowMultipleContexts", "true")
conf.getAll()
[(u'spark.eventLog.enabled', u'true'),
 (u'spark.driver.allowMultipleContexts', u'true'),
 (u'spark.driver.host', u'10.20.70.80'),
 (u'spark.app.name', u'pyspark-shell'),
 (u'spark.eventLog.dir', u'hdfs://10.21.208.21:8020/sparklog'),
 (u'spark.master', u'spark://10.21.208.21:7077')]

sc1 = SparkContext(conf=conf.setAppName("app 1")) ## this sc success
sc1
<pyspark.context.SparkContext at 0x1b7cf10>

sc2 = SparkContext(conf=conf.setAppName("app 2")) ## this failed
ValueError                                Traceback (most recent call last)
<ipython-input-23-e6dcca5aec38> in <module>()
----> 1 sc2 = SparkContext(conf=conf.setAppName("app 2"))

/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc)
    100         """
    101         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 102         SparkContext._ensure_initialized(self, gateway=gateway)
    103         try:
    104             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    226                         " created by %s at %s:%s "
    227                         % (currentAppName, currentMaster,
--> 228                             callsite.function, callsite.file, callsite.linenum))
    229                 else:
    230                     SparkContext._active_spark_context = instance

ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=app 1, master=spark://10.21.208.21:7077) created by __init__ at <ipython-input-21-fb3adb569241>:1
解决方案

This is a PySpark-specific limitation that existed before the spark.driver.allowMultipleContexts configuration was added (which relates to multiple SparkContext objects within a JVM). PySpark disallows multiple active SparkContexts because various parts of its implementation assume that certain components have global shared state.

这篇关于如何创建一个控制台多个SparkContexts的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 13:13