使用元素树findall解析XML名称空间

使用元素树findall解析XML名称空间

本文介绍了使用元素树findall解析XML名称空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在给出以下xml的情况下,如何使用查询元素树 findall('Email')

How can I use a query element tree findall('Email') given the following xml?

<DocuSignEnvelopeInformation xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.docusign.net/API/3.0">
    <EnvelopeStatus>
        <RecipientStatus>
                <Type>Signer</Type>
                <Email>[email protected]</Email>
                <UserName>Joe Shmoe</UserName>
                <RoutingOrder>1</RoutingOrder>
                <Sent>2015-05-04T09:58:01.947</Sent>
                <Delivered>2015-05-04T09:58:14.403</Delivered>
                <Signed>2015-05-04T09:58:29.473</Signed>
        </RecipientStatus>
    </EnvelopeStatus>
</DocuSignEnvelopeInformation>

我觉得它与名称空间有关,但我不确定。我看了,但没有运气。

I have a feeling it has to do with the namespace but I'm not sure. I looked at the docs and had no luck.

tree = <xml.etree.ElementTree.ElementTree object at 0x7f27a47c4fd0>
root = tree.getroot()
root
<Element '{http://www.docusign.net/API/3.0}DocuSignEnvelopeInformation' at 0x7f27a47b8a48>

root.findall('Email')
[]


推荐答案

您应该更仔细地阅读文档,尤其是,其中包括一个几乎正是您想要的示例。

You should read the docs more closely, in particular the section on Parsing XML with Namespaces, which includes an example that is almost exactly what you want.

但是即使没有文档,答案也实际上包含在示例输出中。当您打印文档的根元素时...

But even without the docs, the answer is actually contained in your example output. When you printed the root element of your document...

>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>> root
<Element {http://www.docusign.net/API/3.0}DocuSignEnvelopeInformation at 0x7f972cd079e0>

...您可以看到它打印了根元素名称( DocuSignEnvelopeInformation )和命名空间前缀( {http://www.docusign.net/API/3.0} )。您可以在 findall 的参数中使用相同的前缀:

...you can see that it printed the root element name (DocuSignEnvelopeInformation) with a namespace prefix ({http://www.docusign.net/API/3.0}). You can use this same prefix as part of your argument to findall:

>>> root.findall('{http://www.docusign.net/API/3.0}Email')

但是这本身是行不通的,因为这只会找到 Email 元素,它们是root元素的直接子元素。您需要提供一个表达式,以引起 findall 执行整个文档的搜索。可行:

But this by itself won't work, since this would only find Email elements that are immediate children of the root element. You need to provide an ElementPath expression to cause findall to perform a search of the entire document. This works:

>>> root.findall('.//{http://www.docusign.net/API/3.0}Email')
[<Element {http://www.docusign.net/API/3.0}Email at 0x7f972949a6c8>]

您还可以使用XPath和名称空间前缀执行类似的搜索,如下所示:

You can also perform a similar search using XPath and namespace prefixes, like this:

>>> root.xpath('//docusign:Email',
... namespaces={'docusign': 'http://www.docusign.net/API/3.0'})
[<Element {http://www.docusign.net/API/3.0}Email at 0x7f972949a6c8>]

这样,您就可以使用类似XML的命名空间:前缀,而不是LXML命名空间语法。

This lets you use XML-like namespace: prefixes instead of the LXML namespace syntax.

这篇关于使用元素树findall解析XML名称空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 13:15