本文介绍了Scrapy 的 Scrapyd 调度蜘蛛太慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在运行 Scrapyd 并且在同时启动 4 个蜘蛛时遇到一个奇怪的问题.
I am running Scrapyd and encounter a weird issue when launching 4 spiders at the same time.
2012-02-06 15:27:17+0100 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:17+0100 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:17+0100 [HTTPChannel,2,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:17+0100 [HTTPChannel,3,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:18+0100 [Launcher] Process started: project='thz' spider='spider_1' job='abb6b62650ce11e19123c8bcc8cc6233' pid=2545
2012-02-06 15:27:19+0100 [Launcher] Process finished: project='thz' spider='spider_1' job='abb6b62650ce11e19123c8bcc8cc6233' pid=2545
2012-02-06 15:27:23+0100 [Launcher] Process started: project='thz' spider='spider_2' job='abb72f8e50ce11e19123c8bcc8cc6233' pid=2546
2012-02-06 15:27:24+0100 [Launcher] Process finished: project='thz' spider='spider_2' job='abb72f8e50ce11e19123c8bcc8cc6233' pid=2546
2012-02-06 15:27:28+0100 [Launcher] Process started: project='thz' spider='spider_3' job='abb76f6250ce11e19123c8bcc8cc6233' pid=2547
2012-02-06 15:27:29+0100 [Launcher] Process finished: project='thz' spider='spider_3' job='abb76f6250ce11e19123c8bcc8cc6233' pid=2547
2012-02-06 15:27:33+0100 [Launcher] Process started: project='thz' spider='spider_4' job='abb7bb8e50ce11e19123c8bcc8cc6233' pid=2549
2012-02-06 15:27:35+0100 [Launcher] Process finished: project='thz' spider='spider_4' job='abb7bb8e50ce11e19123c8bcc8cc6233' pid=2549
我已经为 Scrapyd 设置了这些设置:
I already have these settings for Scrapyd:
[scrapyd]
max_proc = 10
为什么 Scrapyd 没有按照计划的速度同时运行蜘蛛?
Why isn't Scrapyd running the spiders at the same time, as quick as they are scheduled?
推荐答案
我已经通过在第 30 行编辑 scrapyd/app.py 解决了这个问题.
I've solved it by editing scrapyd/app.py on line 30.
将 timer = TimerService(5, poller.poll)
更改为 timer = TimerService(0.1, poller.poll)
AliBZ下面关于配置设置的评论是更改轮询频率的更好方法.
The comment below by AliBZ regarding the configuration settings is a better way to change the polling frequency.
这篇关于Scrapy 的 Scrapyd 调度蜘蛛太慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!