问题描述
我是 Scrapy 的新手 &Python,最近推出了我的第一个蜘蛛.有一个功能以前似乎有用,但现在它只适用于我试图废弃的一些网站.
I am new to Scrapy & Python, recently launched my first spider. There is a feature that seems to have worked before though now it only works for some of the websites I am trying to scrap.
代码行是:
item['url_direct'] = response.request.meta['redirect_urls']
我得到的错误是:
exceptions.KeyError: 'redirect_urls'
我已经为此苦苦挣扎了一段时间,因此非常感谢任何线索或希望更详细的答案.(在这里或网上没有找到类似的问题).
I have been struggling with this for a while so any clue or hopefully a more detailed answer will be very much appreciated. (Didn't find a similar question here or on the web).
推荐答案
所以,response.request.meta['redirect_urls']
是由 RedirectMiddleware 到请求可能经过的任何 URL被重定向.
So, response.request.meta['redirect_urls']
is set by the RedirectMiddleware to any URLs that the request may have gone through while being redirected.
对于尚未重定向的请求,该代码将失败并返回 KeyError
.
For requests that haven't been redirected, that code will fail with a KeyError
.
由于 response.request.meta
只是一个字典,你可以使用:
Since response.request.meta
is just a dict, you can use:
item['url_direct'] = response.request.meta.get('redirect_urls')
或者你可以在设置前检查一下:
Or you can check it before setting:
if 'redirect_urls' in response.request.meta:
item['url_direct'] = response.request.meta['redirect_urls']
另见:
- 重定向中间件文档
- 如何在scrapy(重定向前)(相关问题)
- RedirectMiddleware docs
- how to get the original start_url in scrapy (before redirect) (related question)
See also:
这篇关于Scrapy redirect_urls 异常.KeyError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!