本文介绍了为什么我在Scrapy中获得KeyError?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
未处理的错误
追溯(最近呼叫最后):
文件/usr/lib/python2.7/site-packages/twisted/internet/base.py,第428行,在fireEvent
DeferredList(beforeResults).addCallback (self._continueFiring)
文件/usr/lib/python2.7/site-packages/twisted/internet/defer.py,第321行,addCallback
callbackKeywords = kw)
文件/usr/lib/python2.7/site-packages/twisted/internet/defer.py,第310行,addCallbacks
self._runCallbacks()
文件/ usr / lib / python2 .7 / site-packages / twisted / internet / defer.py,第653行,在_runCallbacks
current.result = callback(current.result,* args,** kw)
---< ;这里遇到的异常> ---
文件/usr/lib/python2.7/site-packages/twisted/internet/base.py,第441行,_continueFiring
可调用(* args,** kwargs)
文件/usr/lib/python2.7/site-packages/twisted/internet/base.py,第667行,disconnectAll
selectables = self.removeAll()
文件/ usr / lib / python2.7 / site-packages / twisted / internet / epollreactor.py,第191行,removeAll
[self._selectables [fd] for fd in self._reads],
exceptions .KeyError:94
数字从案例变化到另一种情况(94可能是97,在另一种情况下on)
我正在使用:
芹菜== 3.1。 19
Django == 1.9.4
Scrapy == 1.3.0
这是我如何在Celery内运行Scrapy:
从台球导入进程
从scrapy.crawler导入CrawlerProcess
from scrapy.utils.project import get_project_settings
class MyCrawlerScript(Process):
def __init __(self,** kwargs):
进程.__ init __(self)
settings = get_project_settings('my_scraper')
self.crawler = CrawlerProcess(settings)
self。 spider_name = kwargs.get('spider_name')
self.kwargs = kwargs
def run(self):
self.crawler.crawl(self.spider_name,qwargs = self
self.crawler.start()
def my_crawl_manager(** kwargs):
crawler = MyCrawlerScript(** kwargs)
crawler.start( )
crawler.join()
在芹菜任务中,我打电话: p>
my_crawl_manager(spider_name ='my_spider',url ='www.google.com / any-url-here')
请知道为什么会发生这种情况?
解决方案
我有这个问题一次。
检查你是否有空文件 __ init __。py
文件在 spiders
文件夹或。它应该在那里。
I am using Scrapy spiders inside Celery and I am getting this kind of errors randomly
Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 428, in fireEvent
DeferredList(beforeResults).addCallback(self._continueFiring)
File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 321, in addCallback
callbackKeywords=kw)
File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 310, in addCallbacks
self._runCallbacks()
File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 441, in _continueFiring
callable(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 667, in disconnectAll
selectables = self.removeAll()
File "/usr/lib/python2.7/site-packages/twisted/internet/epollreactor.py", line 191, in removeAll
[self._selectables[fd] for fd in self._reads],
exceptions.KeyError: 94
The number changes from case to case (94 could be 97 in another case and so on)
I am using:
celery==3.1.19
Django==1.9.4
Scrapy==1.3.0
This is how I run Scrapy inside Celery:
from billiard import Process
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
class MyCrawlerScript(Process):
def __init__(self, **kwargs):
Process.__init__(self)
settings = get_project_settings('my_scraper')
self.crawler = CrawlerProcess(settings)
self.spider_name = kwargs.get('spider_name')
self.kwargs = kwargs
def run(self):
self.crawler.crawl(self.spider_name, qwargs=self.kwargs)
self.crawler.start()
def my_crawl_manager(**kwargs):
crawler = MyCrawlerScript(**kwargs)
crawler.start()
crawler.join()
Inside a celery task, I am calling:
my_crawl_manager(spider_name='my_spider', url='www.google.com/any-url-here')
Please any idea why this is happening?
解决方案
I had this issue once.
Check if you have an empty file __init__.py
file in spiders
folders or. It should be there.
这篇关于为什么我在Scrapy中获得KeyError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!