我试图通过以下URL编写蜘蛛以跨多个页面进行爬网:http://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK我正在使用Scrapy版本0.22.1来做到这一点。但是,我得到一个
“无法导入名称CrawlSpider”消息。我已经在下面粘贴了蜘蛛的代码。有人可以确定我在这里出问题了吗?

from scrapy.spider import CrawlSpider, Rule
from scrapy.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import BookpagesItem

class BookpagesSpider(CrawlSpider):
name = "book_sample"
allowed_domains = ["bookshop.lawsociety.org.uk"]
start_urls = ["http://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK",
              ]
rules = (
    Rule(SgmlLinkExtractor(allow=('//*[@id="productList:scrollernext"]', )), callback='parse_item', follow= True),
    Rule(SgmlLinkExtractor(allow=('//p/a[contains(@id, "productList")]', )), callback='parse_item', follow= True),
)

def parse_item(self, response):
    sel = Selector(response)
    sites = sel.xpath('//div[@class="dataListDiv"]')
    items = []
    for site in sites:
        item = BooksItem()
        item['title'] = site.xpath('//div/a/h3[@class="saleProductsTitle"]/text()').extract()
        item['link'] = site.xpath('//p/a[contains(@id, "productList")]').extract()
        item['price'] = site.xpath('//*[@class="saleProductsPrice"]/text()').extract()
        item['category'] = site.xpath('//span[contains(@id, "category")]/text()').extract()
        item['authors'] = site.xpath('//span[contains(@id, "author")]/text()').extract()
        item['date'] = site.xpath('//span[contains(@id, "publicationDate")]/text()').extract()
        item['publisher'] = site.xpath('//span[contains(@id, "publisher")]/text()').extract()
        item['isbn'] = site.xpath('//span[contains(@id, "isbn")]/text()').extract()
        items.append(item)
    return items


items.py代码为:

from scrapy.item import Item, Field

class BookpagesItem(Item):
# define the fields for your item here like:
# name = Field()
title = Field()
link = Field()
price = Field()
category = Field()
authors = Field()
date = Field()
publisher = Field()
isbn = Field()

最佳答案

这表示from scrapy.spider import CrawlSpider, Rule不正确。

查看Scrapy文档,可能应该是from scrapy.contrib.spiders import CrawlSpider

每当出现NameError-无法导入name foo错误时,您都在查看不正确的导入,因此可以将其范围缩小到仅import语句。您可以在库的文档中查找正确的位置,或者在源代码本身(如果有)中查找。

我搜索了草率的文档,发现了这一点:http://doc.scrapy.org/en/0.24/topics/spiders.html#crawlspider

07-26 06:49