如何同时在不同的输入网站上多次运行 Scrapy 抓取器并写入不同的输出文件?

本文介绍了如何同时在不同的输入网站上多次运行 Scrapy 抓取器并写入不同的输出文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有谁知道我如何在不同的网站上运行相同的 Scrapy 抓取工具 200 多次，每个网站都有各自的输出文件?通常在 Scrapy 中，您在从命令行运行时通过键入 -o filename.json.

Does anyone know how I could run the same Scrapy scraper over 200 times on different websites, each with their respective output files? Usually in Scrapy, you indicate the output file when you run it from the command line by typing -o filename.json.

推荐答案

多种方式:

创建管道 删除带有可配置参数的项目，例如运行 scrapy crawl myspider -a output_filename=output_file.txt.output_filename 作为参数添加到蜘蛛中，现在您可以从如下管道访问它:

Create a pipeline to drop the items with configurable parameters, like running scrapy crawl myspider -a output_filename=output_file.txt. output_filename is added as an argument to the spider, and now you can access it from a pipeline like:

class MyPipeline(object):
    def process_item(self, item, spider):
        filename = spider.output_filename
        # now do your magic with filename

您可以在python脚本中运行scrapy，然后对输出项进行处理.

You can run scrapy within a python script, and then also do your things with the output items.

这篇关于如何同时在不同的输入网站上多次运行 Scrapy 抓取器并写入不同的输出文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！