本文介绍了检查是否存在XML声明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试检查xml文件是否包含必要的xml声明(标头),让我们说:

  <?xml version = 1.0 encoding = UTF-8?> 
...其余xml文件...

我正在使用xml ElementTree进行读取并从文件中获取信息,但是即使没有标题也似乎可以很好地加载文件。



到目前为止,我尝试过的是:

 将xml.etree.ElementTree导入为ET 
树= ET.parse(someXmlFile)

尝试:
xmlFile = ET.tostring(tree.getroot(),encoding ='utf8')。decode('utf8')
除外:
sys.stderr.write( Wrong xml2标头(n))
出口(31)

,如果re.match(r ^ \s *< \?xml version = \'1\。 0\'encoding = \'utf8\'\?> \s +,xmlFile)为None:
sys.stderr.write(错误的xml1标头\n)
exit(31)

但是ET.tostring()函数只是在以下情况下组成标题



有没有办法用ET检查xml标头?还是以某种方式在使用ET.parse加载文件时抛出错误,如果文件不包含xml标头?

解决方案

tl; dr

  from xml.dom.minidom import parseString 
def has_xml_declaration(xml) :
返回parseString(xml).version

来自

...

因此,即使XML文档中省略了XML声明,该代码段也是如此:

  if re.match(r ^< \?xml\s * version = \'1\.0\'encoding = \'utf8\'\s * \?>,xmlFile.decode('utf -8'))为无:

将在此XML文档中找到 the默认XML声明。请注意,我使用的是xmlFile.decode('utf-8')而不是xmlFile。
如果您不担心使用 minidom ,则可以使用以下代码段:



<$来自xml.dom.minidom的p $ p> import parse

dom = parse('bookstore-003.xml')
print('<?xml version = {} encoding = {}?>'。format(dom.version,dom.encoding))

这是一个正常工作的
Int bookstore-001.xml中存在XML声明,在bookstore-002.xml中不存在XML声明,在bookstore-003.xml中存在与第一个示例不同的XML声明。 print 指令相应地打印版本和编码。

 < ?xml版本= 1.0编码= UTF-8?> 

<?xml version = None encoding = None?>

<?xml version = 1.0 encoding = ISO-8859-1?>


I am trying to check whether an xml file contains the necessary xml declaration ("header"), let's say:

<?xml version="1.0" encoding="UTF-8"?>
...rest of xml file...

I am using xml ElementTree for reading and getting info out of the file, but it seems to load a file just fine even if it does not have the header.

What I tried so far is this:

import xml.etree.ElementTree as ET
tree = ET.parse(someXmlFile)

try:
    xmlFile = ET.tostring(tree.getroot(), encoding='utf8').decode('utf8')
except:
    sys.stderr.write("Wrong xml2 header\n")
    exit(31)

if re.match(r"^\s*<\?xml version=\'1\.0\' encoding=\'utf8\'\?>\s+", xmlFile) is None:
    sys.stderr.write("Wrong xml1 header\n")
    exit(31)

But the ET.tostring() function just "makes up" a header if it is not present in the file.

Is there any way to check for a xml header with ET? Or somehow throw an error while loading the file with ET.parse, if a file does not contain the xml header?

解决方案

tl;dr

from xml.dom.minidom import parseString
def has_xml_declaration(xml):
    return parseString(xml).version

From Wikipedia's XML declaration

...

So even if the XML declaration is omitted in an XML document, the code-snippet:

if re.match(r"^<\?xml\s*version=\'1\.0\' encoding=\'utf8\'\s*\?>", xmlFile.decode('utf-8')) is None:

will find "the" default XML declaration in this XML document. Please note, that I have used xmlFile.decode('utf-8') instead of xmlFile.If you don't worry to use minidom, you can use the following code-snippet:

from xml.dom.minidom import parse

dom = parse('bookstore-003.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))

Here is a working fiddleInt bookstore-001.xml an XML declaration ist present, in bookstore-002.xml no XML declaration ist present and in bookstore-003.xml a different XML declaration than in the first example ist present. The print instruction prints accordingly the version and the encoding:

<?xml version="1.0" encoding="UTF-8"?>

<?xml version="None" encoding="None"?>

<?xml version="1.0" encoding="ISO-8859-1"?>

这篇关于检查是否存在XML声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-27 15:00
查看更多