问题描述
我想将 url 队列存储在一个数据库中,以便它可以被在不同机器上运行的多个 scrapy 实例共享.
I would like to store the url queue in a database so that it can be shared by multiple instances of scrapy running on different machines.
我发现我需要实现一个队列 API,然后使用 SPIDER_QUEUE_CLASS 设置注册我的队列.但是,我在任何地方都找不到应该实现的 API 的样子.还能用吗?
I've found that there is a queue API that I need to implement and then register my queue using SPIDER_QUEUE_CLASS setting. However, I cannot find anywhere how the API that should be implemented looks like. Is it still available?
推荐答案
scrapy 内部队列使用 queuelib
scrapy internal queue uses queuelib
Queuelib 的目标是速度和简单性.它最初是 Scrapy 框架的一部分,并被剥离到自己的库中.
它的api"比较简单,看push
和pop
的方式由scrapy使用,您的实现可能如下所示:
its "api" is rather simple, see the way push
and pop
are used by scrapy, your implementation might look like:
DatabaseFifoQueue = _serializable_queue(myqueue.DatabaseQueue, \
_pickle_serialize, pickle.loads)
...
这篇关于将scrapy队列存储在数据库中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!