问题描述
有谁知道我如何在不同的网站上运行相同的 Scrapy 抓取工具 200 多次,每个网站都有各自的输出文件?通常在 Scrapy 中,您在从命令行运行时通过键入 -o filename.json.
Does anyone know how I could run the same Scrapy scraper over 200 times on different websites, each with their respective output files? Usually in Scrapy, you indicate the output file when you run it from the command line by typing -o filename.json.
推荐答案
多种方式:
创建
管道
删除带有可配置参数的项目,例如运行scrapy crawl myspider -a output_filename=output_file.txt
.output_filename 作为参数添加到蜘蛛中,现在您可以从如下管道访问它:
Create a
pipeline
to drop the items with configurable parameters, like runningscrapy crawl myspider -a output_filename=output_file.txt
. output_filename is added as an argument to the spider, and now you can access it from a pipeline like:
class MyPipeline(object):
def process_item(self, item, spider):
filename = spider.output_filename
# now do your magic with filename
您可以在python脚本中运行scrapy,然后对输出项进行处理.
You can run scrapy within a python script, and then also do your things with the output items.
这篇关于如何同时在不同的输入网站上多次运行 Scrapy 抓取器并写入不同的输出文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!