提交火花时没有名为numpy的模块

本文介绍了提交火花时没有名为numpy的模块的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在提交一个导入numpy的python文件，但出现no module named numpy错误.

I’m spark-submitting a python file that imports numpy but I’m getting a no module named numpy error.

$ spark-submit --py-files projects/other_requirements.egg projects/jobs/my_numpy_als.py
Traceback (most recent call last):
  File "/usr/local/www/my_numpy_als.py", line 13, in <module>
    from pyspark.mllib.recommendation import ALS
  File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 24, in <module>
    import numpy
ImportError: No module named numpy

我当时想为numpy -python文件添加一个鸡蛋，但是在弄清楚如何制作该鸡蛋时遇到了麻烦.但是后来我想到pyspark本身使用numpy.引入我自己的numpy版本会很愚蠢.

I was thinking I would pull in an egg for numpy —python-files, but I'm having trouble figuring out how to build that egg. But then it occurred to me that pyspark itself uses numpy. It would be silly to pull in my own version of numpy.

对在此处执行适当操作有任何想法吗?

Any idea on the appropriate thing to do here?

推荐答案

似乎Spark正在使用未安装numpy的Python版本.可能是因为您在虚拟环境中工作.

It looks like Spark is using a version of Python that does not have numpy installed. It could be because you are working inside a virtual environment.

尝试一下:

# The following is for specifying a Python version for PySpark. Here we
# use the currently calling Python version.
# This is handy for when we are using a virtualenv, for example, because
# otherwise Spark would choose the default system Python version.
os.environ['PYSPARK_PYTHON'] = sys.executable

这篇关于提交火花时没有名为numpy的模块的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！