可以使用lxml检查xml是否格式正确或功能是否强大?例如,即使xml格式不正确,它似乎也能够解析。检查xml文件格式是否正确的最简单方法是什么?

最佳答案

lxml在解析格式不正确的XML时应该抛出异常,例如:

from lxml import etree

xml = """
<multipleroot>
    <noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
doc = etree.fromstring(xml)


抛出异常:

Traceback (most recent call last):
  File "D:\StackOverflow\Python\Q50.py", line 8, in <module>
    doc = etree.fromstring(xml)
  ......
  ......
XMLSyntaxError: Opening and ending tag mismatch: noclosingtag line 3 and multipleroot, line 4, column 16


但是,如果您明确告诉XMLParser恢复格式不正确的XML,或者您使用的是HTMLParser,则lxml仍然可以解析XML:

from lxml import etree

xml = """
<multipleroot>
    <noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
parser = etree.XMLParser(recover=True)
#parser = etree.HTMLParser()
doc = etree.fromstring(xml, parser=parser)
print(etree.tostring(doc))


成功打印已解析的XML:

<multipleroot>
    <noclosingtag>
</noclosingtag>
<multipleroot/></multipleroot>

10-08 14:16