问题描述
我发现 cElementTree 比 xml.dom.minidom
快大约 30 倍,我正在重写我的 XML 编码/解码代码.但是,我需要输出包含 CDATA 部分的 XML,而 ElementTree 似乎没有办法做到这一点.
I've discovered that cElementTree is about 30 times faster than xml.dom.minidom
and I'm rewriting my XML encoding/decoding code. However, I need to output XML that contains CDATA sections and there doesn't seem to be a way to do that with ElementTree.
可以吗?
推荐答案
经过一番努力,我自己找到了答案.查看 ElementTree.py 源代码,我发现对 XML 注释和预处理指令进行了特殊处理.他们所做的是为特殊元素类型创建一个工厂函数,该函数使用特殊(非字符串)标签值将其与常规元素区分开来.
After a bit of work, I found the answer myself. Looking at the ElementTree.py source code, I found there was special handling of XML comments and preprocessing instructions. What they do is create a factory function for the special element type that uses a special (non-string) tag value to differentiate it from regular elements.
def Comment(text=None):
element = Element(Comment)
element.text = text
return element
然后在实际输出 XML 的 ElementTree 的 _write
函数中,有一个对注释的特殊情况处理:
Then in the _write
function of ElementTree that actually outputs the XML, there's a special case handling for comments:
if tag is Comment:
file.write("<!-- %s -->" % _escape_cdata(node.text, encoding))
为了支持 CDATA 部分,我创建了一个名为 CDATA
的工厂函数,扩展了 ElementTree 类并更改了 _write
函数来处理 CDATA 元素.
In order to support CDATA sections, I create a factory function called CDATA
, extended the ElementTree class and changed the _write
function to handle the CDATA elements.
如果你想用 CDATA 部分解析一个 XML 然后用 CDATA 部分再次输出它,这仍然没有帮助,但它至少允许你以编程方式创建带有 CDATA 部分的 XML,这是我需要做的.
This still doesn't help if you want to parse an XML with CDATA sections and then output it again with the CDATA sections, but it at least allows you to create XMLs with CDATA sections programmatically, which is what I needed to do.
该实现似乎适用于 ElementTree 和 cElementTree.
The implementation seems to work with both ElementTree and cElementTree.
import elementtree.ElementTree as etree
#~ import cElementTree as etree
def CDATA(text=None):
element = etree.Element(CDATA)
element.text = text
return element
class ElementTreeCDATA(etree.ElementTree):
def _write(self, file, node, encoding, namespaces):
if node.tag is CDATA:
text = node.text.encode(encoding)
file.write("\n<![CDATA[%s]]>\n" % text)
else:
etree.ElementTree._write(self, file, node, encoding, namespaces)
if __name__ == "__main__":
import sys
text = """
<?xml version='1.0' encoding='utf-8'?>
<text>
This is just some sample text.
</text>
"""
e = etree.Element("data")
cdata = CDATA(text)
e.append(cdata)
et = ElementTreeCDATA(e)
et.write(sys.stdout, "utf-8")
这篇关于如何使用 ElementTree 输出 CDATA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!