本文介绍了pyspark ImportError:无法导入名称累加器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:我试图在我的pycharm IDE中正确解释apache-spark pyspark.

Goal: I am trying to get apache-spark pyspark to be appropriately interpreted within my pycharm IDE.

问题:我目前收到以下错误:

Problem: I currently receive the following error:

ImportError: cannot import name accumulators

我正在关注以下博客,以帮助我完成整个过程. http://renien.github.io/blog/accessing-pyspark-pycharm/

I was following the following blog to help me through the process. http://renien.github.io/blog/accessing-pyspark-pycharm/

由于我的代码采用了我个人尝试过的除外路径:except:只是为了查看确切的错误是什么.

Due to the fact my code was taking the except path I personally got rid of the try: except: just to see what the exact error was.

在此之前,我收到以下错误:

Prior to this I received the following error:

ImportError: No module named py4j.java_gateway

只需在bash中键入"$ sudo pip install py4j"即可解决此问题.

This was fixed simply by typing '$sudo pip install py4j' in bash.

我的代码当前如下所示:

My code currently looks like the following chunk:

import os
import sys

# Path for spark source folder
os.environ['SPARK_HOME']="[MY_HOME_DIR]/spark-1.2.0"

# Append pyspark to Python Path
sys.path.append("[MY_HOME_DIR]/spark-1.2.0/python/")

try:
    from pyspark import SparkContext
    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)
    sys.exit(1)

我的问题:
1.此错误的根源是什么?原因是什么? 2.如何解决该问题,以便可以在pycharm编辑器中运行pyspark.

My Questions:
1. What is the source of this error? What is the cause? 2. How do I remedy the issue so I can run pyspark in my pycharm editor.

注意:我在pycharm中使用的当前解释器是Python 2.7.8(〜/anaconda/bin/python)

NOTE: The current interpreter I use in pycharm is Python 2.7.8 (~/anaconda/bin/python)

提前谢谢!

推荐答案

首先,设置您的环境var

Firstly, set your environment var

export SPARK_HOME=/home/.../Spark/spark-2.0.1-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.10.3-src.zip:$PYTHONPATH
PATH="$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$PYTHONPATH"

确保使用自己的版本名称

make sure that you use your own version name

然后重新启动!确认您的设置很重要.

and then, restart! it is important to validate you setting.

这篇关于pyspark ImportError:无法导入名称累加器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 12:00
查看更多