本文介绍了Scrapy 似乎没有在做 DFO的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网站,我的爬虫需要遵循一个序列.因此,例如,它需要先经过 a1、b1、c1,然后才能开始 a2 等.a、b 和 c 中的每一个都由不同的解析函数处理,并在 Request 对象中创建相应的 url 并生成.下面大致说明了我使用的代码:

类蜘蛛(BaseSpider):定义解析(自我,响应):产量请求(b,回调=self.parse_b,优先级=10)def parse_b(self,response):产量请求(c,回调=self.parse_c,优先级=20)def parse_c(self,response)final_function()

然而,我发现爬行的顺序似乎是 a1,a2,a3,b1,b2,b3,c1,c2,c3 这很奇怪,因为我认为 Scrapy 应该首先保证深度.

顺序不必很严格,但是我正在抓取的站点有一个限制,因此 Scrapy 需要在 5 级 bs 被抓取之前尽快开始抓取 c 级.如何实现?

解决方案

深度优先搜索正是您所描述的:

在移动到 b 之前尽可能深入地搜索

要将 Scrapy 更改为进行广度优先搜索(a1、b1、c1、a2 等...),请更改以下设置:

DEPTH_PRIORITY = 1SCHEDULER_DISK_QUEUE = 'scrapy.squeue.PickleFifoDiskQueue'SCHEDULER_MEMORY_QUEUE = 'scrapy.squeue.FifoMemoryQueue'

*在doc.scrapy.org 常见问题

I have a website for which my crawler needs to follow a sequence. So for example, it needs to go a1, b1, c1 before it starts going a2 etc. each of a, b and c are handled by different parse functions and the corresponding urls are created in a Request object and yielded. The following roughly illustrates the code I'm using:

class aspider(BaseSpider):

    def parse(self,response):
        yield Request(b, callback=self.parse_b, priority=10)

    def parse_b(self,response):
        yield Request(c, callback=self.parse_c, priority=20)

    def parse_c(self,response)
        final_function()        

However, I find that the sequence of crawls seem to be a1,a2,a3,b1,b2,b3,c1,c2,c3 which is strange since I thought Scrapy is supposed to guarantee depth first.

The sequence doesn't have to be strict, but the site I'm scraping has a limit in place so Scrapy need to start scraping level c as soon as it can before 5 of level bs get crawled. How can this be achieved?

解决方案

Depth first searching is exactly what you are describing:

To change Scrapy to do breadth-first searching (a1, b1, c1, a2, etc...), change these settings:

DEPTH_PRIORITY = 1
SCHEDULER_DISK_QUEUE = 'scrapy.squeue.PickleFifoDiskQueue'
SCHEDULER_MEMORY_QUEUE = 'scrapy.squeue.FifoMemoryQueue'

*Found in the doc.scrapy.org FAQ

这篇关于Scrapy 似乎没有在做 DFO的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 09:35