python - Python:如何处理带有多个 child 的大型XML文件(1个根)

我有一个带有数据结构的XML文件，例如

<report>
  <table>
    <detail name="John" surname="Smith">
    <detail name="Michael" surname="Smith">
    <detail name="Nick" surname="Smith">
    ... {a lot of <detail> elements}
  </table>
</report>

我需要检查元素是否具有属性'name'=='surname'。

XML文件大于1 GB，尝试etree.parse（file）时出错。

如何使用Python和LXML一对一处理元素？

最佳答案

考虑iterparse，它允许您在构建树时处理元素。下面检查名称属性是否等同于姓氏属性。使用if块进行进一步处理，就像有条件地将值添加到列表一样：

import xml.etree.ElementTree as et

data = []
path = "/path/to/source.xml"

# get an iterable
context = et.iterparse(path, events=("start", "end"))

# turn it into an iterator
context = iter(context)

# get the root element
ev, root = next(context)

for ev, el in context:
    if ev == 'start' and el.tag == 'detail':
        print(el.attrib['name'] == el.attrib['surname'])
        data.append([el.attrib['name'], el.attrib['surname']])
        root.clear()

print(data)
# False
# False
# False

# [['John', 'Smith'], ['Michael', 'Smith'], ['Nick', 'Smith']]

关于python - Python:如何处理带有多个 child 的大型XML文件(1个根)，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46040040/