本文介绍了导入 PySpark 包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经下载了 graphframes 包(从 这里) 并将其保存在我的本地磁盘上.现在,我想使用它.所以,我使用以下命令:

I have downloaded the graphframes package (from here) and saved it on my local disk. Now, I would like to use it. So, I use the following command:

IPYTHON_OPTS="notebook --no-browser" pyspark --num-executors=4  --name gorelikboris_notebook_1  --py-files ~/temp/graphframes-0.1.0-spark1.5.jar --jars ~/temp/graphframes-0.1.0-spark1.5.jar --packages graphframes:graphframes:0.1.0-spark1.5

所有 pyspark 功能都按预期工作,除了新的 graphframes 包:每当我尝试 import graphframes 时,我都会收到 ImportError.当我检查 sys.path 时,可以看到以下两条路径:

All the pyspark functionality works as expected, except for the new graphframes package: whenever I try to import graphframes, I get an ImportError. When I examine sys.path, I can see the following two paths:

/tmp/spark-1eXXX/userFiles-9XXX/graphframes_graphframes-0.1.0-spark1.5.jar/tmp/spark-1eXXX/userFiles-9XXX/graphframes-0.1.0-spark1.5.jar,但是这些文件不存在.而且,/tmp/spark-1eXXX/userFiles-9XXX/目录是空的.

/tmp/spark-1eXXX/userFiles-9XXX/graphframes_graphframes-0.1.0-spark1.5.jar and /tmp/spark-1eXXX/userFiles-9XXX/graphframes-0.1.0-spark1.5.jar, however these files don't exist. Moreover, the /tmp/spark-1eXXX/userFiles-9XXX/ directory is empty.

我错过了什么?

推荐答案

这通常是 Python 的 Spark 包中的一个问题.其他人在 Spark 用户讨论别名上问得太早了.

This might be an issue in Spark packages with Python in general. Someone else was asking about it too earlier on the Spark user discussion alias.

我的解决方法是解压jar包找到嵌入的python代码,然后将python代码移动到一个名为graphframes的子目录中.

My workaround is to unpackage the jar to find the python code embedded, and then move the python code into a subdirectory called graphframes.

例如,我从我的主目录运行 pyspark

For instance, I run pyspark from my home directory

~$ ls -lart
drwxr-xr-x 2 user user   4096 Feb 24 19:55 graphframes

~$ ls graphframes/
__init__.pyc  examples.pyc  graphframe.pyc  tests.pyc

不过,您不需要 py-files 或 jars 参数,例如

You would not need the py-files or jars parameters, though, something like

IPYTHON_OPTS="notebook --no-browser" pyspark --num-executors=4 --name gorelikboris_notebook_1 --packages graphframes:graphframes:0.1.0-spark1.5

并且在graphframes目录中包含python代码应该可以工作.

and having the python code in the graphframes directory should work.

这篇关于导入 PySpark 包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-25 10:47