我有这段代码,当两个蜘蛛完成程序仍在运行时。

#!C:\Python27\python.exe

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from carrefour.spiders.tesco import TescoSpider
from carrefour.spiders.carr import CarrSpider
from scrapy.utils.project import get_project_settings
import threading
import time

def tescofcn():
    tescoSpider = TescoSpider()
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.configure()
    crawler.crawl(tescoSpider)
    crawler.start()

def carrfcn():
    carrSpider = CarrSpider()
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.configure()
    crawler.crawl(carrSpider)
    crawler.start()


t1=threading.Thread(target=tescofcn)
t2=threading.Thread(target=carrfcn)

t1.start()
t2.start()
log.start()
reactor.run()


当我尝试将此插入到两个功能

crawler.signals.connect(reactor.stop, signal=signals.spider_closed)


,尽管对蜘蛛来说都是速度更快的末端反应器,但速度较慢的蜘蛛仍未终止,但仍被终止。

最佳答案

您可以做的是创建一个函数,检查蜘蛛运行的列表并将其连接到singals.spider_closed

from scrapy.utils.trackref import iter_all


def close_reactor_if_no_spiders():
    running_spiders = [spider for spider in iter_all('Spider')]

    if not running_spiders:
        reactor.stop()

crawler.signals.connect(close_reactor_if_no_spiders, signal=signals.spider_closed)


虽然,我仍然建议使用scrapyd来管理运行多个蜘蛛。

关于python - 当两个蜘蛛都完成后如何停止 react 堆,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/25480298/

10-12 21:23