python - Scrapy，Python:一个管道中的多个项目类？

我有一个spider，它可以擦除不能保存在一个item类中的数据。
为了举例说明，我有一个概要文件项，并且每个概要文件项可能具有未知数量的注释。这就是为什么我想要实现配置文件项和注释项。我知道只要使用yield就可以将它们传递到管道中。
但是，我不知道具有一个parse_item函数的管道如何处理两个不同的项类？
或者可以使用不同的解析项函数吗？
或者我必须使用几条管道？
或者是否可以将迭代器写入零碎的项字段？

comments_list=[]
comments=response.xpath(somexpath)
for x in comments.extract():
        comments_list.append(x)
    ScrapyItem['comments'] =comments_list

最佳答案

默认情况下，每个项目都通过每个管道。
例如，如果您生成一个ProfileItem和一个CommentItem，它们都将通过所有管道。如果您有一个管道设置来跟踪项目类型，那么您的process_item方法可能如下所示：

def process_item(self, item, spider):
    self.stats.inc_value('typecount/%s' % type(item).__name__)
    return item

当ProfileItem通过时，'typecount/ProfileItem'将递增。当CommentItem通过时，'typecount/CommentItem'将递增。
但是，如果处理该项类型是唯一的，则可以让一个管道只处理一种类型的项请求，方法是在继续之前检查项类型：

def process_item(self, item, spider):
    if not isinstance(item, ProfileItem):
        return item
    # Handle your Profile Item here.

如果在不同的管道中设置了上述两个process_item方法，则该项将同时通过这两个方法，进行跟踪和处理（或在第二个管道中忽略）。
此外，还可以有一个管道设置来处理所有“相关”项：

def process_item(self, item, spider):
    if isinstance(item, ProfileItem):
        return self.handleProfile(item, spider)
    if isinstance(item, CommentItem):
        return self.handleComment(item, spider)

def handleComment(item, spider):
    # Handle Comment here, return item

def handleProfile(item, spider):
    # Handle profile here, return item

或者，您可以使其更加复杂，开发一个类型委托系统，它加载类并调用默认的处理程序方法，类似于scrapy处理中间件/管道的方式。这取决于你需要它有多复杂，你想做什么。

关于python - Scrapy，Python:一个管道中的多个项目类？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/32743469/