问题描述
根据,我是使用做一些非常基本的XML遍历,只读。
As per this thread, I am using xml.dom.minidom
to do some very basic XML traversing, read-only.
令我困惑的是为什么它的 getElementsByTagName
正在发现节点深处几个层次级别,而没有显式提供其确切路径。
What confuses me is why its getElementsByTagName
is finding nodes several hierarchy levels deep without explicitly supplying it with their exact path.
XML:
<data>
<items>
<item name="item1"></item>
<item name="item2"></item>
<item name="item3"></item>
<item name="item4"></item>
</items>
<secondSetOfItems>
<item name="item5"></item>
<item name="item6"></item>
<item name="item7"></item>
<item name="item8"></item>
</secondSetOfItems>
</data>
Python代码:
xmldoc = minidom.parse('sampleXML.xml')
items = xmldoc.getElementsByTagName('item')
for item in items:
print item.attributes['name'].value
打印:
item1
item2
item3
item4
item5
item6
item7
item8
让我困扰的是它隐式地找到了名为 item $的标签。
data-> items
和 data-> secondSetOfItems
下的c $ c>。
What bothers me is that it implicitly finds tags named item
under both data->items
as well as data->secondSetOfItems
.
如何使它遵循明确的路径,并且仅提取两个类别之一中的项?例如。在 data-> secondSetOfItems
下:
How do I make it follow an explicit path and only extract items under one of the two categories? E.g. under data->secondSetOfItems
:
item5
item6
item7
item8
推荐答案
如果您要从特定类别中获取商品,可以先获取父元素。
If you want to get items from a specific category, you can do so by grabbing the parent element first.
例如:
代码:
xmldoc = minidom.parse('sampleXML.xml')
#Grab the first occurence of the "secondSetOfItems" element
second_items = xmldoc.getElementsByTagName("secondSetOfItems")[0]
item_list = second_items.getElementsByTagName("item")
for item in item_list:
print item.attributes['name'].value
输出:
item5
item6
item7
item8
这篇关于在Python中使用minidom查找XML元素的特定路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!