python - Scrapy错误:TypeError:__init __()获得了意外的关键字参数'deny'

我放了一个蜘蛛，它一直按预期运行，直到我在规则中添加了关键字deny。

这是我的蜘蛛：

from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from bhg.items import BhgItem

class BhgSpider (CrawlSpider):
    name = 'bhg'
    start_urls = ['http://www.bhg.com/holidays/st-patricks-day/']
    rules = (Rule(LinkExtractor(allow=[r'/*'], ),
                  deny=('blogs/*', 'videos/*', ),
                  callback='parse_html'), )

def parse_html(self, response):
    hxs = Selector(response)
    item = BhgItem()

    item['title'] = hxs.xpath('//title/text()').extract()
    item['h1'] = hxs.xpath('//h1/text()').extract()
    item['canonical'] = hxs.xpath('//link[@rel = \"canonical\"]/@href').extract()
    item['meta_desc'] = hxs.xpath('//meta[@name=\"description"]/@content').extract()
    item['url'] = response.request.url
    item['status_code'] = response.status
    return item

当我运行此代码时，我得到：

deny=('blogs/', 'videos/', ),), )
TypeError: __init__() got an unexpected keyword argument 'deny'

我究竟做错了什么？好吧，我想一个函数或某些东西不期望额外的参数（deny），但是哪个函数呢？ parse_html()？

我没有定义其他蜘蛛，也没有__init__()

最佳答案

应该将deny作为参数传递给LinkExtractor，但是您可以将其放在这些括号之外，然后将其传递给Rule。将其移入内部，因此您具有：

rules = (Rule(LinkExtractor(allow=[r'/*'], deny=('blogs/*', 'videos/*', )),
                  callback='parse_html'), )

__init__是在实例化类时传递参数时调用的方法，就像在这里使用Rule和LinkExtractor类一样。

Deny

python - Scrapy错误:TypeError:init ()获得了意外的关键字参数'deny'