本文介绍了使用 feedparser 检索项目的原始 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 feedparser 从提要中检索一些特定信息,但也检索每个条目的原始 XML(即 RSS 和 Atom 的元素),但我不知道如何做到这一点.显然我可以手动解析 XML,但这不是很优雅,需要对 RSS 和 Atom 的单独支持,我想它可能会与 feedparser 不同步以获取格式错误的提要.有没有更好的办法?

I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way?

谢谢!

推荐答案

我是 feedparser 的当前开发人员.目前,获取该信息的方法之一是对 feedparser._FeedParserMixin 进行猴子补丁(或编辑 feedparser.py 的本地副本).您要修改的方法是:

I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin (or edit a local copy of feedparser.py). The methods you'll want to modify are:

  • feedparser._FeedParserMixin.unknown_starttag
  • feedparser._FeedParserMixin.unknown_endtag

在每个方法的顶部,您可以插入一个回调到您自己的例程,该例程将捕获 feedparser 遇到的元素及其属性.

At the top of each method you can insert a callback to a routine of your own that will capture the elements and their attributes as they're encountered by feedparser.

这篇关于使用 feedparser 检索项目的原始 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-21 13:29