本文介绍了scrapy crawl [spider-name] 故障的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我正在使用scrapy框架和python构建一个网络抓取项目.在我的项目的蜘蛛文件夹中,我有两个蜘蛛,分别命名为蜘蛛 1 和蜘蛛 2

Hi guys i am building a web scraping project using scrapy framework and python.In spider folder of my project i have two spiders named spider1 and spider2

spider1.py

class spider(BaseSpider):
    name= "spider1"
    ........
    ........

spider2.py

class spider(BaseSpider):
    name="spider2"
    ............
    ...........

settings.py

settings.py

SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = ['project_name.spiders']
ITEM_PIPELINES = ['project_name.pipelines.spider']

现在,当我在我的根项目文件夹中编写命令 scrapy crawl spider1 时,它会调用spider2.py 而不是spider1.py.当我从我的项目中删除 spider2.py 然后它调用 spider1.py

Now when i write the command scrapy crawl spider1 in my root project folder it calls spider2.py instead of spider1.py. when i will delete spider2.py from my project then it calls spider1.py

前 1 天恢复正常工作 1 个月但突然发生了什么我无法弄清楚请帮助我

Earlier 1 day back its working fine for 1 month but suddenly what happens i can't figure it out please help me guys

推荐答案

以 Nomad 的回答为基础.您可以通过添加以下内容来避免在开发过程中创建除一个 pyc 文件之外的所有文件:

Building on Nomad's answer.You can avoid the creation of all but one pyc file during development by adding:

import sys
sys.dont_write_bytecode = True

到项目的__init__.py"文件.

这将阻止创建 .pyc 文件.如果您正在处理一个项目并重命名蜘蛛的文件名,则特别有用.防止保留旧蜘蛛的缓存 pyc 以及其他一些问题.

This will prevent .pyc files from being created. Especially useful if you are working on a project and you rename the file name of a spider. Prevents the cached pyc of the old spiders remaining, and a few other gotchas.

这篇关于scrapy crawl [spider-name] 故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-11 21:56