python - 定义LxmlLinkExtractor规则时如何使用RegEx列表

我想知道如何在Scrapy Spider之外定义RegEx的列表，然后将RegEx读入LxmlLinkExtractor。

我正在使用当前代码：

file = open("myFile.txt")
regexs = [rule.strip() for rule in file.readlines()]
file.close()
return regexs

然后，将返回值作为参数传递，如下所示：

Rule(LinkExtractor(allow=(regexs, )), callback='parse_file')

这导致以下错误：

TypeError: unhashable type: 'list'

最佳答案

这应该工作：

regexs = [rule.strip() for rule in file.readlines()]
LinkExtractor(allow=regexs, callback='parse_file')

关于python - 定义LxmlLinkExtractor规则时如何使用RegEx列表，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/37994874/