本文介绍了为什么不能PySpark找到py4j.java_gateway?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我安装星火,跑了SBT组装,并且可以打开斌/ pyspark没有问题。然而,我遇到加载pyspark模块插入IPython的问题。我得到了以下错误:

 在[1]:进口pyspark
-------------------------------------------------- -------------------------
导入错误回溯(最新最后调用)
< IPython的输入-1-c15ae3402d12>上述<模块>()
----> 1进口pyspark/usr/local/spark/python/pyspark/__init__.py上述<模块>()
     61
     62 pyspark.conf进口SparkConf
---> 63从pyspark.context进口SparkContext
     64从pyspark.sql进口SQLContext
     65从pyspark.rdd进口RDD/usr/local/spark/python/pyspark/context.py上述<模块>()
     28 pyspark.conf进口SparkConf
     从pyspark.files 29导入SparkFiles
---> 30日从pyspark.java_gateway进口launch_gateway
     从pyspark.serializers 31导入PickleSerializer,BatchedSerializer,UTF8Deserializer,\\
     32 PairDese​​rializer,COM pressedSerializer/usr/local/spark/python/pyspark/java_gateway.py上述<模块>()
     24从子进口POPEN,PIPE
     25从螺纹螺纹进口
---> 26 py4j.java_gateway进口java_import,JavaGateway,GatewayClient
     27
     28导入错误:没有模块名为py4j.java_gateway


解决方案

在我的环境中(使用泊坞窗和图像sequenceiq /火花:1.1.0 Ubuntu的),我跑了这一点。如果你看一下pyspark shell脚本,你会看到,你需要添加到您的PYTHONPATH几件事情:

 出口PYTHONPATH = $ SPARK_HOME /蟒蛇/:$ PYTHONPATH
出口PYTHONPATH = $ SPARK_HOME / Python的/ lib目录/ py4j-0.8.2.1-src.zip:$ PYTHONPATH

在IPython中为我工作。

I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I'm getting the following error:

In [1]: import pyspark
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark

/usr/local/spark/python/pyspark/__init__.py in <module>()
     61
     62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
     64 from pyspark.sql import SQLContext
     65 from pyspark.rdd import RDD

/usr/local/spark/python/pyspark/context.py in <module>()
     28 from pyspark.conf import SparkConf
     29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
     31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \
     32     PairDeserializer, CompressedSerializer

/usr/local/spark/python/pyspark/java_gateway.py in <module>()
     24 from subprocess import Popen, PIPE
     25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient
     27
     28

ImportError: No module named py4j.java_gateway
解决方案

In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH

That worked in ipython for me.

这篇关于为什么不能PySpark找到py4j.java_gateway?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 19:49