问题描述
我正在提交一个导入numpy的python文件,但出现no module named numpy
错误.
I’m spark-submitting a python file that imports numpy but I’m getting a no module named numpy
error.
$ spark-submit --py-files projects/other_requirements.egg projects/jobs/my_numpy_als.py
Traceback (most recent call last):
File "/usr/local/www/my_numpy_als.py", line 13, in <module>
from pyspark.mllib.recommendation import ALS
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 24, in <module>
import numpy
ImportError: No module named numpy
我当时想为numpy -python文件添加一个鸡蛋,但是在弄清楚如何制作该鸡蛋时遇到了麻烦.但是后来我想到pyspark本身使用numpy.引入我自己的numpy版本会很愚蠢.
I was thinking I would pull in an egg for numpy —python-files, but I'm having trouble figuring out how to build that egg. But then it occurred to me that pyspark itself uses numpy. It would be silly to pull in my own version of numpy.
对在此处执行适当操作有任何想法吗?
Any idea on the appropriate thing to do here?
推荐答案
似乎Spark正在使用未安装numpy
的Python版本.可能是因为您在虚拟环境中工作.
It looks like Spark is using a version of Python that does not have numpy
installed. It could be because you are working inside a virtual environment.
尝试一下:
# The following is for specifying a Python version for PySpark. Here we
# use the currently calling Python version.
# This is handy for when we are using a virtualenv, for example, because
# otherwise Spark would choose the default system Python version.
os.environ['PYSPARK_PYTHON'] = sys.executable
这篇关于提交火花时没有名为numpy的模块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!