问题描述
我已经下载了 graphframes
包(从 这里) 并将其保存在我的本地磁盘上.现在,我想使用它.所以,我使用以下命令:
I have downloaded the graphframes
package (from here) and saved it on my local disk. Now, I would like to use it. So, I use the following command:
IPYTHON_OPTS="notebook --no-browser" pyspark --num-executors=4 --name gorelikboris_notebook_1 --py-files ~/temp/graphframes-0.1.0-spark1.5.jar --jars ~/temp/graphframes-0.1.0-spark1.5.jar --packages graphframes:graphframes:0.1.0-spark1.5
所有 pyspark 功能都按预期工作,除了新的 graphframes
包:每当我尝试 import graphframes
时,我都会收到 ImportError
.当我检查 sys.path
时,可以看到以下两条路径:
All the pyspark functionality works as expected, except for the new graphframes
package: whenever I try to import graphframes
, I get an ImportError
. When I examine sys.path
, I can see the following two paths:
/tmp/spark-1eXXX/userFiles-9XXX/graphframes_graphframes-0.1.0-spark1.5.jar
和 /tmp/spark-1eXXX/userFiles-9XXX/graphframes-0.1.0-spark1.5.jar
,但是这些文件不存在.而且,/tmp/spark-1eXXX/userFiles-9XXX/
目录是空的.
/tmp/spark-1eXXX/userFiles-9XXX/graphframes_graphframes-0.1.0-spark1.5.jar
and /tmp/spark-1eXXX/userFiles-9XXX/graphframes-0.1.0-spark1.5.jar
, however these files don't exist. Moreover, the /tmp/spark-1eXXX/userFiles-9XXX/
directory is empty.
我错过了什么?
推荐答案
这通常是 Python 的 Spark 包中的一个问题.其他人在 Spark 用户讨论别名上问得太早了.
This might be an issue in Spark packages with Python in general. Someone else was asking about it too earlier on the Spark user discussion alias.
我的解决方法是解压jar包找到嵌入的python代码,然后将python代码移动到一个名为graphframes
的子目录中.
My workaround is to unpackage the jar to find the python code embedded, and then move the python code into a subdirectory called graphframes
.
例如,我从我的主目录运行 pyspark
For instance, I run pyspark from my home directory
~$ ls -lart
drwxr-xr-x 2 user user 4096 Feb 24 19:55 graphframes
~$ ls graphframes/
__init__.pyc examples.pyc graphframe.pyc tests.pyc
不过,您不需要 py-files 或 jars 参数,例如
You would not need the py-files or jars parameters, though, something like
IPYTHON_OPTS="notebook --no-browser" pyspark --num-executors=4 --name gorelikboris_notebook_1 --packages graphframes:graphframes:0.1.0-spark1.5
并且在graphframes目录中包含python代码应该可以工作.
and having the python code in the graphframes directory should work.
这篇关于导入 PySpark 包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!