使用Python和lxml根据外部DTD验证XML

使用Python和lxml根据外部DTD验证XML

本文介绍了使用Python和lxml根据外部DTD验证XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据doctype标记中引用的外部DTD验证XML文件.具体来说:

I'm trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">
...the rest of the document...

我正在使用Python 3.3和lxml模块.通过阅读 http://lxml.de/validation.html#validation-at-parse -时间,我把这个扔了:

I'm using Python 3.3 and the lxml module. From reading http://lxml.de/validation.html#validation-at-parse-time, I've thrown this together:

enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage.
enexParser = etree.XMLParser(dtd_validation=True)
enexTree = etree.parse(enexFile, enexParser)

据我对validation.html的了解,lxml库现在应该负责检索DTD并执行验证.但是,相反,我得到了:

From what I understand of validation.html, the lxml library should now take care of retrieving the DTD and performing validation. But instead, I get this:

$ ./mapwrangler.py validate notes.enex
Traceback (most recent call last):
  File "./mapwrangler.py", line 27, in <module>
    enexTree = etree.parse(enexFile, enexParser)
  File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
  File "parser.pxi", line 1769, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102257)
  File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:102516)
  File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:101442)
  File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:97069)
  File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91275)
  File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
  File "parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91757)
lxml.etree.XMLSyntaxError: Validation failed: no DTD found !, line 3, column 43

这让我感到惊讶,因为如果我关闭验证,那么文档就可以很好地解析了,我可以执行print(enexTree.docinfo.doctype)来获取

This surprises me, because if I turn off validation, then the document parses in just fine and I can do print(enexTree.docinfo.doctype) to get

$ ./mapwrangler.py validate notes.enex
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">

所以在我看来,找到DTD应该没有任何问题.

So it looks to me like there shouldn't be any problem finding the DTD.

感谢您的帮助.

推荐答案

构造解析器对象时,需要添加no_network=False.默认情况下,此选项设置为True.

You need to add no_network=False when constructing the parser object. This option is set to True by default.

摘自解析器选项的文档,网址为 http://lxml.de/parsing.html#parsers:

From the documentation of parser options at http://lxml.de/parsing.html#parsers:

这篇关于使用Python和lxml根据外部DTD验证XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:03