本文介绍了如何为不同的蜘蛛设置不同的scrapy-settings?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想为某些蜘蛛启用一些 http 代理,并为其他蜘蛛禁用它们.
I want to enable some http-proxy for some spiders, and disable them for other spiders.
我可以做这样的事情吗?
Can I do something like this?
# settings.py
proxy_spiders = ['a1' , b2']
if spider in proxy_spider: #how to get spider name ???
HTTP_PROXY = 'http://127.0.0.1:8123'
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.RandomUserAgentMiddleware': 400,
'myproject.middlewares.ProxyMiddleware': 410,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
else:
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.RandomUserAgentMiddleware': 400,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
如果上面的代码不起作用,还有什么建议吗?
If the code above doesn't work, is there any other suggestion?
推荐答案
你可以定义自己的代理中间件,像这样简单:
You can define your own proxy middleware, something straightforward like this:
from scrapy.contrib.downloadermiddleware import HttpProxyMiddleware
class ConditionalProxyMiddleware(HttpProxyMiddleware):
def process_request(self, request, spider):
if getattr(spider, 'use_proxy', None):
return super(ConditionalProxyMiddleware, self).process_request(request, spider)
然后在要启用代理的蜘蛛中定义属性 use_proxy = True
.不要忘记禁用默认代理中间件并启用修改后的中间件.
Then define the attribute use_proxy = True
in the spiders that you want to have the proxy enabled. Don't forget to disable the default proxy middleware and enable your modified one.
这篇关于如何为不同的蜘蛛设置不同的scrapy-settings?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!