我有这段代码,当两个蜘蛛完成程序仍在运行时。
#!C:\Python27\python.exe
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from carrefour.spiders.tesco import TescoSpider
from carrefour.spiders.carr import CarrSpider
from scrapy.utils.project import get_project_settings
import threading
import time
def tescofcn():
tescoSpider = TescoSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(tescoSpider)
crawler.start()
def carrfcn():
carrSpider = CarrSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(carrSpider)
crawler.start()
t1=threading.Thread(target=tescofcn)
t2=threading.Thread(target=carrfcn)
t1.start()
t2.start()
log.start()
reactor.run()
当我尝试将此插入到两个功能
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
,尽管对蜘蛛来说都是速度更快的末端反应器,但速度较慢的蜘蛛仍未终止,但仍被终止。
最佳答案
您可以做的是创建一个函数,检查蜘蛛运行的列表并将其连接到singals.spider_closed
。
from scrapy.utils.trackref import iter_all
def close_reactor_if_no_spiders():
running_spiders = [spider for spider in iter_all('Spider')]
if not running_spiders:
reactor.stop()
crawler.signals.connect(close_reactor_if_no_spiders, signal=signals.spider_closed)
虽然,我仍然建议使用
scrapyd
来管理运行多个蜘蛛。关于python - 当两个蜘蛛都完成后如何停止 react 堆,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/25480298/