本文介绍了Scrapy redirect_urls 异常.KeyError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Scrapy 的新手 &Python,最近推出了我的第一个蜘蛛.有一个功能以前似乎有用,但现在它只适用于我试图废弃的一些网站.

I am new to Scrapy & Python, recently launched my first spider. There is a feature that seems to have worked before though now it only works for some of the websites I am trying to scrap.

代码行是:

item['url_direct'] = response.request.meta['redirect_urls']

我得到的错误是:

exceptions.KeyError: 'redirect_urls'

我已经为此苦苦挣扎了一段时间,因此非常感谢任何线索或希望更详细的答案.(在这里或网上没有找到类似的问题).

I have been struggling with this for a while so any clue or hopefully a more detailed answer will be very much appreciated. (Didn't find a similar question here or on the web).

推荐答案

所以,response.request.meta['redirect_urls'] 是由 RedirectMiddleware 到请求可能经过的任何 URL被重定向.

So, response.request.meta['redirect_urls'] is set by the RedirectMiddleware to any URLs that the request may have gone through while being redirected.

对于尚未重定向的请求,该代码将失败并返回 KeyError.

For requests that haven't been redirected, that code will fail with a KeyError.

由于 response.request.meta 只是一个字典,你可以使用:

Since response.request.meta is just a dict, you can use:

item['url_direct'] = response.request.meta.get('redirect_urls')

或者你可以在设置前检查一下:

Or you can check it before setting:

if 'redirect_urls' in response.request.meta:
    item['url_direct'] = response.request.meta['redirect_urls']

另见:

  • 重定向中间件文档
  • 如何在scrapy(重定向前)(相关问题)
  • See also:

    • RedirectMiddleware docs
    • how to get the original start_url in scrapy (before redirect) (related question)
    • 这篇关于Scrapy redirect_urls 异常.KeyError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 16:07