我有一个简单的XML文档,我正试图用python dom读入它(见下文):
XML文件:

<?xml version="1.0" encoding="utf-8"?>
<HeaderLookup>
    <Header>
        <Reserved>2</Reserved>
        <CPU>1</CPU>
        <Flag>1</Flag>
        <VQI>12</VQI>
        <Group_ID>16</Group_ID>
        <DI>2</DI>
        <DE>1</DE>
        <ACOSS>5</ACOSS>
        <RGH>8</RGH>
    </Header>
</HeaderLookup>

python代码:
from xml.dom import minidom

xml_file = open("test.xml")
xmlroot = minidom.parse(xml_file).documentElement
xml_file.close()

for item in xmlroot.getElementsByTagName("Header")[0].childNodes:
    print item

结果:
<DOM Text node "u'\n\t\t'">
<DOM Element: Reserved at 0x28d2828>
<DOM Text node "u'\n\t\t'">
<DOM Element: CPU at 0x28d28c8>
<DOM Text node "u'\n\t\t'">
<DOM Element: Flag at 0x28d2968>
<DOM Text node "u'\n\t\t'">
<DOM Element: VQI at 0x28d2a08>
<DOM Text node "u'\n\t\t'">
<DOM Element: Group_ID at 0x28d2ad0>
<DOM Text node "u'\n\t\t'">
<DOM Element: DI at 0x28d2b70>
<DOM Text node "u'\n\t\t'">
<DOM Element: DE at 0x28d2c10>
<DOM Text node "u'\n\t\t'">
<DOM Element: ACOSS at 0x28d2cb0>
<DOM Text node "u'\n\t\t'">
<DOM Element: RGH at 0x28d2d50>
<DOM Text node "u'\n\t'">

结果应该是9个子节点(reserved、cpu、flag、vqi、group_id、di、de、acos和rgh),但出于某种原因,它返回的是19个节点的列表,其中10个是空白(为什么这一点一开始被视为节点)?!!)有人能告诉我是否有办法让XML解析器不包括空白节点?

最佳答案

空白在XML中很重要,但请查看ElementTree,它有一个与DOM不同的用于处理XML的API。
例子

from xml.etree import ElementTree as et

data = '''\
<?xml version="1.0" encoding="utf-8"?>
<HeaderLookup>
    <Header>
        <Reserved>2</Reserved>
        <CPU>1</CPU>
        <Flag>1</Flag>
        <VQI>12</VQI>
        <Group_ID>16</Group_ID>
        <DI>2</DI>
        <DE>1</DE>
        <ACOSS>5</ACOSS>
        <RGH>8</RGH>
    </Header>
</HeaderLookup>
'''

tree = et.fromstring(data)
for n in tree.find('Header'):
    print n.tag,'=',n.text

产量
Reserved = 2
CPU = 1
Flag = 1
VQI = 12
Group_ID = 16
DI = 2
DE = 1
ACOSS = 5
RGH = 8

示例(扩展以前的代码)
空白仍然存在,但它在.tail属性中。tail是元素后面的文本节点(在一个元素的结束和下一个元素的开始之间),而text是元素的开始/结束标记之间的文本节点。
def dump(e):
    print '<%s>' % e.tag
    print 'text =',repr(e.text)
    for n in e:
        dump(n)
    print '</%s>' % e.tag
    print 'tail =',repr(e.tail)

dump(tree)

产量
<HeaderLookup>
text = '\n    '
<Header>
text = '\n        '
<Reserved>
text = '2'
</Reserved>
tail = '\n        '
<CPU>
text = '1'
</CPU>
tail = '\n        '
<Flag>
text = '1'
</Flag>
tail = '\n        '
<VQI>
text = '12'
</VQI>
tail = '\n        '
<Group_ID>
text = '16'
</Group_ID>
tail = '\n        '
<DI>
text = '2'
</DI>
tail = '\n        '
<DE>
text = '1'
</DE>
tail = '\n        '
<ACOSS>
text = '5'
</ACOSS>
tail = '\n        '
<RGH>
text = '8'
</RGH>
tail = '\n    '
</Header>
tail = '\n'
</HeaderLookup>
tail = None

10-07 18:59
查看更多