本文介绍了Work-horse 进程意外终止 RQ 和 Scrapy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从 redis (rq) 中检索一个函数,该函数生成一个 CrawlerProcess 但我得到了
I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting
Work-horse 进程意外终止(waitpid 返回 11)
控制台日志:
将作业移至失败"队列(工作马意外终止;waitpid 返回 11)
在我用注释标记的那一行
on the line I marked with comment
这条线杀死了程序
我做错了什么?我该如何解决?
What am I doing wrong?How I can fix it?
这个函数我从 RQ 中检索得很好:
This function I retrieve well from RQ:
def custom_executor(url):
process = CrawlerProcess({
'USER_AGENT': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36",
'DOWNLOAD_TIMEOUT': 20000, # 100
'ROBOTSTXT_OBEY': False,
'HTTPCACHE_ENABLED': False,
'REDIRECT_ENABLED': False,
'SPLASH_URL': 'http://localhost:8050/',
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',
'DOWNLOADER_MIDDLEWARES': {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
},
'SPIDER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': True,
'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': True,
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': True,
'scrapy.extensions.closespider.CloseSpider': True,
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
})
### THIS LINE KILL THE PROGRAM
process.crawl(ExtractorSpider,
start_urls=[url, ], es_client=es_get_connection(),
redis_conn=redis_get_connection())
process.start()
这是我的 ExtractorSpider:
and this is my ExtractorSpider:
class ExtractorSpider(Spider):
name = "Extractor Spider"
handle_httpstatus_list = [301, 302, 303]
def parse(self, response):
yield SplashRequest(url=url, callback=process_screenshot,
endpoint='execute', args=SPLASH_ARGS)
谢谢
推荐答案
由于大量计算而没有足够的内存,进程崩溃.增加内存解决了这个问题.
The process crashed due to heavy calculations while not having enough memory. Increasing the memory fixed that issue.
这篇关于Work-horse 进程意外终止 RQ 和 Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!