问题描述
大家好,我正在使用scrapy框架和python构建一个网络抓取项目.在我的项目的蜘蛛文件夹中,我有两个蜘蛛,分别命名为蜘蛛 1 和蜘蛛 2
Hi guys i am building a web scraping project using scrapy framework and python.In spider folder of my project i have two spiders named spider1 and spider2
spider1.py
class spider(BaseSpider):
name= "spider1"
........
........
spider2.py
class spider(BaseSpider):
name="spider2"
............
...........
settings.py
settings.py
SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = ['project_name.spiders']
ITEM_PIPELINES = ['project_name.pipelines.spider']
现在,当我在我的根项目文件夹中编写命令 scrapy crawl spider1
时,它会调用spider2.py 而不是spider1.py.当我从我的项目中删除 spider2.py 然后它调用 spider1.py
Now when i write the command scrapy crawl spider1
in my root project folder it calls spider2.py instead of spider1.py. when i will delete spider2.py from my project then it calls spider1.py
前 1 天恢复正常工作 1 个月但突然发生了什么我无法弄清楚请帮助我
Earlier 1 day back its working fine for 1 month but suddenly what happens i can't figure it out please help me guys
推荐答案
以 Nomad 的回答为基础.您可以通过添加以下内容来避免在开发过程中创建除一个 pyc 文件之外的所有文件:
Building on Nomad's answer.You can avoid the creation of all but one pyc file during development by adding:
import sys
sys.dont_write_bytecode = True
到项目的__init__.py"文件.
这将阻止创建 .pyc 文件.如果您正在处理一个项目并重命名蜘蛛的文件名,则特别有用.防止保留旧蜘蛛的缓存 pyc 以及其他一些问题.
This will prevent .pyc files from being created. Especially useful if you are working on a project and you rename the file name of a spider. Prevents the cached pyc of the old spiders remaining, and a few other gotchas.
这篇关于scrapy crawl [spider-name] 故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!