我正在尝试编写我的第一个刮spider蜘蛛,但我一直在http://doc.scrapy.org/en/latest/intro/tutorial.html上按照教程进行操作,但是却收到错误消息“KeyError:'Spider not found:”
我认为我是从正确的目录(带有scrapy.cfg文件的目录)运行命令的
(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
tree
.
├── scrapy
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── juno_spider.py
└── scrapy.cfg
2 directories, 7 files
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
ls
scrapy scrapy.cfg
这是我遇到的错误
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
scrapy crawl juno
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification. Many valid certificate/hostname mappings may be rejected.
verifyHostname, VerificationError = _selectVerifyImplementation()
Traceback (most recent call last):
File "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in <module>
load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()
File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run
spider = crawler.spiders.create(spname, **opts.spargs)
File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create
raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: juno'
这是我的virtualenv:
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
pip freeze
Scrapy==0.24.4
Twisted==14.0.2
cffi==0.8.6
cryptography==0.6
cssselect==0.9.1
ipdb==0.8
ipython==2.3.0
lxml==3.4.0
pyOpenSSL==0.14
pycparser==2.10
queuelib==1.2.2
six==1.8.0
w3lib==1.10.0
wsgiref==0.1.2
zope.interface==4.1.1
这是我的蜘蛛的代码,其中填充了name属性:
(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
cat scrapy/spiders/juno_spider.py
import scrapy
class JunoSpider(scrapy.Spider):
name = "juno"
allowed_domains = ["http://www.juno.co.uk/"]
start_urls = [
"http://www.juno.co.uk/dj-equipment/"
]
def parse(self, response):
filename = response.url.split("/")[-2]
with open(filename, 'wb') as f:
f.write(response.body)
最佳答案
当您开始使用scrapy作为项目名称的项目时,它将创建您打印的目录结构:
.
├── scrapy
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── juno_spider.py
└── scrapy.cfg
但是,使用scrapy作为项目名称会产生附带影响。如果打开生成的
scrapy.cfg
,您将看到默认设置指向scrapy.settings
模块。[settings]
default = scrapy.settings
当我们记录
scrapy.settings
文件时,我们看到:BOT_NAME = 'scrapy'
SPIDER_MODULES = ['scrapy.spiders']
NEWSPIDER_MODULE = 'scrapy.spiders'
好吧,这没什么奇怪的。机器人名称,Scrapy将在其中寻找蜘蛛的模块列表以及使用genspider命令在其中创建新蜘蛛的模块。到现在为止还挺好。
现在,让我们检查scrapy库。它已正确安装在
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy
目录下的proscraper隔离的virtualenv下。请记住,site-packages
总是添加到sys.path
,它包含Python将在其中搜索模块的所有路径。所以,猜猜是什么... scrapy库还具有一个settings
模块/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings
,该模块导入/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py
,该SPIDER_MODULES
保存所有设置的默认值。特别注意默认的scrapy.settings
条目:SPIDER_MODULES = []
也许您开始了解正在发生的事情。选择scrapy作为项目名称还生成了一个
scrapy.settings
模块,该模块与scrapy库sys.path
相冲突。这是在KeyError: 'Spider not found: juno'
中插入相应路径的顺序的顺序,这将使Python导入一个或另一个。首先出现的胜利。在这种情况下,scrapy库设置将获胜。因此是scrap
。要解决此冲突,您可以将项目文件夹重命名为另一个名称,例如
scrapy.cfg
:.
├── scrap
│ ├── __init__.py
修改您的
settings
以指向正确的scrap.settings
模块:[settings]
default = scrap.settings
并更新您的ojit_code以指向正确的蜘蛛:
SPIDER_MODULES = ['scrap.spiders']
但正如@paultrmbrth所建议的那样,我将使用另一个名称重新创建该项目。
关于python - Python Scrapy教程KeyError : 'Spider not found:,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26359598/