问题描述
我正在尝试检查xml文件是否包含必要的xml声明(标头),让我们说:
<?xml version = 1.0 encoding = UTF-8?>
...其余xml文件...
我正在使用xml ElementTree进行读取并从文件中获取信息,但是即使没有标题也似乎可以很好地加载文件。
到目前为止,我尝试过的是:
将xml.etree.ElementTree导入为ET
树= ET.parse(someXmlFile)
尝试:
xmlFile = ET.tostring(tree.getroot(),encoding ='utf8')。decode('utf8')
除外:
sys.stderr.write( Wrong xml2标头(n))
出口(31)
,如果re.match(r ^ \s *< \?xml version = \'1\。 0\'encoding = \'utf8\'\?> \s +,xmlFile)为None:
sys.stderr.write(错误的xml1标头\n)
exit(31)
但是ET.tostring()函数只是在以下情况下组成标题
有没有办法用ET检查xml标头?还是以某种方式在使用ET.parse加载文件时抛出错误,如果文件不包含xml标头?
tl; dr
from xml.dom.minidom import parseString
def has_xml_declaration(xml) :
返回parseString(xml).version
来自
...
因此,即使XML文档中省略了XML声明,该代码段也是如此:
if re.match(r ^< \?xml\s * version = \'1\.0\'encoding = \'utf8\'\s * \?>,xmlFile.decode('utf -8'))为无:
将在此XML文档中找到 the默认XML声明。请注意,我使用的是xmlFile.decode('utf-8')而不是xmlFile。
如果您不担心使用 minidom
,则可以使用以下代码段:
<$来自xml.dom.minidom的p $ p>
import parse
dom = parse('bookstore-003.xml')
print('<?xml version = {} encoding = {}?>'。format(dom.version,dom.encoding))
这是一个正常工作的
Int bookstore-001.xml中存在XML声明,在bookstore-002.xml中不存在XML声明,在bookstore-003.xml中存在与第一个示例不同的XML声明。 print
指令相应地打印版本和编码。
< ?xml版本= 1.0编码= UTF-8?>
<?xml version = None encoding = None?>
<?xml version = 1.0 encoding = ISO-8859-1?>
I am trying to check whether an xml file contains the necessary xml declaration ("header"), let's say:
<?xml version="1.0" encoding="UTF-8"?>
...rest of xml file...
I am using xml ElementTree for reading and getting info out of the file, but it seems to load a file just fine even if it does not have the header.
What I tried so far is this:
import xml.etree.ElementTree as ET
tree = ET.parse(someXmlFile)
try:
xmlFile = ET.tostring(tree.getroot(), encoding='utf8').decode('utf8')
except:
sys.stderr.write("Wrong xml2 header\n")
exit(31)
if re.match(r"^\s*<\?xml version=\'1\.0\' encoding=\'utf8\'\?>\s+", xmlFile) is None:
sys.stderr.write("Wrong xml1 header\n")
exit(31)
But the ET.tostring() function just "makes up" a header if it is not present in the file.
Is there any way to check for a xml header with ET? Or somehow throw an error while loading the file with ET.parse, if a file does not contain the xml header?
tl;dr
from xml.dom.minidom import parseString
def has_xml_declaration(xml):
return parseString(xml).version
From Wikipedia's XML declaration
...
So even if the XML declaration is omitted in an XML document, the code-snippet:
if re.match(r"^<\?xml\s*version=\'1\.0\' encoding=\'utf8\'\s*\?>", xmlFile.decode('utf-8')) is None:
will find "the" default XML declaration in this XML document. Please note, that I have used xmlFile.decode('utf-8') instead of xmlFile.If you don't worry to use minidom
, you can use the following code-snippet:
from xml.dom.minidom import parse
dom = parse('bookstore-003.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
Here is a working fiddleInt bookstore-001.xml an XML declaration ist present, in bookstore-002.xml no XML declaration ist present and in bookstore-003.xml a different XML declaration than in the first example ist present. The print
instruction prints accordingly the version and the encoding:
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="None" encoding="None"?>
<?xml version="1.0" encoding="ISO-8859-1"?>
这篇关于检查是否存在XML声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!