如果蜘蛛获得重定向,则它应该再次请求,但参数不同。
第二个请求中的回调未执行。
如果在urls
和start
方法中使用不同的checker
,则可以正常工作。我认为请求正在使用lazy loads
,这就是为什么我的代码无法正常工作的原因,但不确定。
from scrapy.http import Request
from scrapy.spider import BaseSpider
class TestSpider(BaseSpider):
def start(self, response):
return Request(url = 'http://localhost/', callback=self.checker, meta={'dont_redirect': True})
def checker(self, response):
if response.status == 301:
return Request(url = "http://localhost/", callback=self.results, meta={'dont_merge_cookies': True})
else:
return self.results(response)
def results(self, response):
# here I work with response
最佳答案
不知道您是否仍然需要这个,但我整理了一个例子。如果您有特定的网站,我们绝对可以看一下。
from scrapy.http import Request
from scrapy.spider import BaseSpider
class TestSpider(BaseSpider):
name = "TEST"
allowed_domains = ["example.com", "example.iana.org"]
def __init__(self, **kwargs):
super( TestSpider, self ).__init__(**kwargs)\
self.url = "http://www.example.com"
self.max_loop = 3
self.loop = 0 # We want it to loop 3 times so keep a class var
def start_requests(self):
# I'll write it out more explicitly here
print "OPEN"
checkRequest = Request(
url = self.url,
meta = {"test":"first"},
callback = self.checker
)
return [ checkRequest ]
def checker(self, response):
# I wasn't sure about a specific website that gives 302
# so I just used 200. We need the loop counter or it will keep going
if(self.loop<self.max_loop and response.status==200):
print "RELOOPING", response.status, self.loop, response.meta['test']
self.loop += 1
checkRequest = Request(
url = self.url,
callback = self.checker
).replace(meta = {"test":"not first"})
return [checkRequest]
else:
print "END LOOPING"
self.results(response) # No need to return, just call method
def results(self, response):
print "DONE" # Do stuff here
在settings.py中,设置此选项
DUPEFILTER_CLASS = 'scrapy.dupefilter.BaseDupeFilter'
实际上,这是关闭重复站点请求筛选器的原因。这很令人困惑,因为BaseDupeFilter实际上并不是默认值,因为它实际上并未过滤任何内容。这意味着我们将提交3个不同的请求,这些请求将通过checker方法循环。另外,我正在使用scrapy 0.16:
>scrapy crawl TEST
>OPEN
>RELOOPING 200 0 first
>RELOOPING 200 1 not first
>RELOOPING 200 2 not first
>END LOOPING
>DONE
关于python - 如何在scrapy中的两个顺序请求中进行回调,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/16590110/