在scrapy中将基本网址与结果href结合起来

本文介绍了在scrapy中将基本网址与结果href结合起来的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面是我的蜘蛛代码，

class Blurb2Spider(BaseSpider):
   name = "blurb2"
   allowed_domains = ["www.domain.com"]

   def start_requests(self):
            yield self.make_requests_from_url("http://www.domain.com/bookstore/new")


   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       urls = hxs.select('//div[@class="bookListingBookTitle"]/a/@href').extract()
       for i in urls:
           yield Request(urlparse.urljoin('www.domain.com/', i[1:]),callback=self.parse_url)


   def parse_url(self, response):
       hxs = HtmlXPathSelector(response)
       print response,'------->'

在这里，我试图将 href 链接与基本链接结合起来，但出现以下错误，

Here i am trying to combine the href link with the base link , but i am getting the following error ,

exceptions.ValueError: Missing scheme in request url: www.domain.com//bookstore/detail/3271993?alt=Something+I+Had+To+Do

谁能告诉我为什么我会收到此错误以及如何将基本网址与 href 链接连接并产生请求

Can anyone let me know why i am getting this error and how to join base url with href link and yield a request

基本

在scrapy中将基本网址与结果href结合起来

问题描述

推荐答案