本文介绍了为什么不能PySpark找到py4j.java_gateway?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我安装星火,跑了SBT组装,并且可以打开斌/ pyspark没有问题。然而,我遇到加载pyspark模块插入IPython的问题。我得到了以下错误:
在[1]:进口pyspark
-------------------------------------------------- -------------------------
导入错误回溯(最新最后调用)
< IPython的输入-1-c15ae3402d12>上述<模块>()
----> 1进口pyspark/usr/local/spark/python/pyspark/__init__.py上述<模块>()
61
62 pyspark.conf进口SparkConf
---> 63从pyspark.context进口SparkContext
64从pyspark.sql进口SQLContext
65从pyspark.rdd进口RDD/usr/local/spark/python/pyspark/context.py上述<模块>()
28 pyspark.conf进口SparkConf
从pyspark.files 29导入SparkFiles
---> 30日从pyspark.java_gateway进口launch_gateway
从pyspark.serializers 31导入PickleSerializer,BatchedSerializer,UTF8Deserializer,\\
32 PairDeserializer,COM pressedSerializer/usr/local/spark/python/pyspark/java_gateway.py上述<模块>()
24从子进口POPEN,PIPE
25从螺纹螺纹进口
---> 26 py4j.java_gateway进口java_import,JavaGateway,GatewayClient
27
28导入错误:没有模块名为py4j.java_gateway
解决方案
在我的环境中(使用泊坞窗和图像sequenceiq /火花:1.1.0 Ubuntu的),我跑了这一点。如果你看一下pyspark shell脚本,你会看到,你需要添加到您的PYTHONPATH几件事情:
出口PYTHONPATH = $ SPARK_HOME /蟒蛇/:$ PYTHONPATH
出口PYTHONPATH = $ SPARK_HOME / Python的/ lib目录/ py4j-0.8.2.1-src.zip:$ PYTHONPATH
在IPython中为我工作。
I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I'm getting the following error:
In [1]: import pyspark
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark
/usr/local/spark/python/pyspark/__init__.py in <module>()
61
62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
64 from pyspark.sql import SQLContext
65 from pyspark.rdd import RDD
/usr/local/spark/python/pyspark/context.py in <module>()
28 from pyspark.conf import SparkConf
29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \
32 PairDeserializer, CompressedSerializer
/usr/local/spark/python/pyspark/java_gateway.py in <module>()
24 from subprocess import Popen, PIPE
25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient
27
28
ImportError: No module named py4j.java_gateway
解决方案
In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
That worked in ipython for me.
这篇关于为什么不能PySpark找到py4j.java_gateway?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!