本文介绍了Scrapy - 为项目中的特定蜘蛛(而不是其他蜘蛛)使用提要导出器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

环境:Windows7,Python 3.6.5,Scrapy 1.5.1

ENVIRONMENT:Windows7, Python 3.6.5, Scrapy 1.5.1

问题描述:

我有一个名为project_github的scrapy项目,其中包含3个蜘蛛:spider1spider2spider3.这些蜘蛛程序中的每一个都将数据从特定网站个人抓取到该蜘蛛程序.

I have a scrapy project called project_github, which contains 3 spiders:spider1, spider2, spider3. Each of these spiders scrapes data from a particular website individual to that spider.

我正在尝试在执行特定蜘蛛时自动导出 JSON 文件,格式为:NameOfSpider_TodaysDate.json,以便我可以从命令行:

I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json, so that from the command line I can:

执行脚本scrapy crawl spider1,返回spider1_181115.json

目前我在 settings.py 中使用 ITEM EXPORTERS 和以下代码:

Currently I am using ITEM EXPORTERS in settings.py with the following code:

import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'

显然这段代码总是写 spider1_TodaysDate.json 而不管使用什么蜘蛛......有什么建议吗?

Obviously this code always writes spider1_TodaysDate.json regardless of the spider used... Any suggestions?

推荐答案

这样做的方法是将 custom_settings 定义为特定蜘蛛下的 class 属性,分别是编写项目导出器.Spider 设置会覆盖项目设置.

The way to do this is by defining custom_settings as a class attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.

所以,对于spider1:

class spider1(scrapy.Spider):
    name = "spider1"
    allowed_domains = []

    custom_settings = {
        'FEED_URI': 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
        'FEED_FORMAT': 'json',
        'FEED_EXPORTERS': {
            'json': 'scrapy.exporters.JsonItemExporter',
        },
        'FEED_EXPORT_ENCODING': 'utf-8',
    }

这篇关于Scrapy - 为项目中的特定蜘蛛(而不是其他蜘蛛)使用提要导出器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 12:35