我有通过站点抓取的爬虫爬虫。在某些情况下,由于 RAM 问题,scrapy 会自行杀死。我重写了蜘蛛,以便它可以被拆分并为一个站点运行。

初始运行后,我使用 subprocess.Popen 再次提交带有新启动项的爬虫爬虫。

但我收到错误
ImportError: No module named shop.settingsTraceback (most recent call last):File "/home/kumar/envs/ishop/bin/scrapy", line 4, in <module> execute()File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/cmdline.py", line 109, in execute settings = get_project_settings()File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings settings.setmodule(settings_module_path, priority='project')File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 109, in setmodule module = import_module(module)File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module __import__(name)ImportError: No module named shop.settings
子进程cmd是
newp = Popen(comm, stderr=filename, stdout=filename, cwd=fp, shell=True)

  • 通讯 -source /home/kumar/envs/ishop/bin/activate && cd /home/kumar/projects/usg/shop/spiders/../.. && /home/kumar/envs/ishop/bin/scrapy crawl -a category=laptop -a site=newsite -a start=2 -a numpages=10 -a split=1 'allsitespider'
  • cwd - /home/kumar/projects/usg

  • 我检查了 sys.path,它是正确的 ['/home/kumar/envs/ishop/bin', '/home/kumar/envs/ishop/lib64/python27.zip', '/home/kumar/envs/ishop/lib64/python2.7', '/home/kumar/envs/ishop/lib64/python2.7/plat-linux2', '/home/kumar/envs/ishop/lib64/python2.7/lib-tk', '/home/kumar/envs/ishop/lib64/python2.7/lib-old', '/home/kumar/envs/ishop/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7', '/usr/lib/python2.7', '/home/kumar/envs/ishop/lib/python2.7/site-packages']
    但看起来 import 语句使用的是 "/usr/lib64/python2.7/importlib/__init__.py" 而不是我的虚拟环境。

    我哪里错了?请帮助?

    最佳答案

    看起来设置未正确加载。一种解决方案是在启动爬虫之前构建一个 egg 并将其部署在 env 中。

    官方文档,Eggify scrapy project

    关于python - 使用 subprocess.Popen 时 Scrapy ImportError : No module named project. 设置,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/27731670/

    10-13 00:48