python - 使用lxml解析具有多个 namespace 的xml

我正在从SOAP api中提取xml，如下所示：

<SOAP-ENV:Envelope xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ae="urn:sbmappservices72" xmlns:c14n="http://www.w3.org/2001/10/xml-exc-c14n#" xmlns:diag="urn:SerenaDiagnostics" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Header/>
<SOAP-ENV:Body>
    <ae:GetItemsByQueryResponse>
      <ae:return>
        <ae:item>
          <ae:id xsi:type="ae:ItemIdentifier">
            <ae:displayName/>
            <ae:id>10</ae:id>
            <ae:uuid>a9b91034-8f4d-4043-b9b6-517ba4ed3a33</ae:uuid>
            <ae:tableId>1541</ae:tableId>
            <ae:tableIdItemId>1541:10</ae:tableIdItemId>
            <ae:issueId/>
          </ae:id>

我一辈子都不能用芬德尔来拉tableId之类的东西。关于使用lxml进行解析的大多数教程都不包含名称空间，但是the one at lxml.de包含名称空间，我一直在尝试遵循它。
根据他们的教程，您应该创建一个名称空间字典，我已经这样做了：

r = tree.xpath('/e:SOAP-ENV/s:ae',
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

但这似乎不起作用，当我试图得到r的len时，它返回为0：

print 'length: ' + str(len(r)) #<---- always equals 0

由于第二个名称空间的URI是一个“urn:”，所以我也尝试使用到wsdl的真实URL，但这给了我相同的结果。
有什么明显的我遗漏了吗？我只需要能够像tableIdItemId那样提取值。
任何帮助都将不胜感激。

最佳答案

XPath与XML结构不正确对应。请改为这样做：

r = tree.xpath('/e:Envelope/e:Body/s:GetItemsByQueryResponse/s:return/s:item/s:id/s:tableId',
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

对于小型XML，您可能希望使用//而不是/来简化表达式，例如：

r = tree.xpath('/e:Envelope/e:Body//s:tableId',
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

/e:Body//s:tableId将找到tableId，无论它嵌套在Body中的深度如何。但是请注意，//肯定比/慢，特别是当应用于一个巨大的XML时。

tableid

python - 使用lxml解析具有多个 namespace 的xml