我有通过站点抓取的爬虫爬虫。在某些情况下,由于 RAM 问题,scrapy 会自行杀死。我重写了蜘蛛,以便它可以被拆分并为一个站点运行。
初始运行后,我使用 subprocess.Popen 再次提交带有新启动项的爬虫爬虫。
但我收到错误ImportError: No module named shop.settingsTraceback (most recent call last):File "/home/kumar/envs/ishop/bin/scrapy", line 4, in <module> execute()File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/cmdline.py", line 109, in execute settings = get_project_settings()File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings settings.setmodule(settings_module_path, priority='project')File "/home/kumar/envs/ishop/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 109, in setmodule module = import_module(module)File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module __import__(name)ImportError: No module named shop.settings
子进程cmd是newp = Popen(comm, stderr=filename, stdout=filename, cwd=fp, shell=True)
source /home/kumar/envs/ishop/bin/activate && cd /home/kumar/projects/usg/shop/spiders/../.. && /home/kumar/envs/ishop/bin/scrapy crawl -a category=laptop -a site=newsite -a start=2 -a numpages=10 -a split=1 'allsitespider'
我检查了 sys.path,它是正确的
['/home/kumar/envs/ishop/bin', '/home/kumar/envs/ishop/lib64/python27.zip', '/home/kumar/envs/ishop/lib64/python2.7', '/home/kumar/envs/ishop/lib64/python2.7/plat-linux2', '/home/kumar/envs/ishop/lib64/python2.7/lib-tk', '/home/kumar/envs/ishop/lib64/python2.7/lib-old', '/home/kumar/envs/ishop/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7', '/usr/lib/python2.7', '/home/kumar/envs/ishop/lib/python2.7/site-packages']
但看起来 import 语句使用的是
"/usr/lib64/python2.7/importlib/__init__.py"
而不是我的虚拟环境。我哪里错了?请帮助?
最佳答案
看起来设置未正确加载。一种解决方案是在启动爬虫之前构建一个 egg 并将其部署在 env 中。
官方文档,Eggify scrapy project
关于python - 使用 subprocess.Popen 时 Scrapy ImportError : No module named project. 设置,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/27731670/