问题描述
我有以下 XML 文件的玩具示例.我有成千上万个这样的.我很难解析这个文件.
I have the following toy example of an XML file. I have thousands of these. I have difficulty parsing this file.
看第二行的文字.我所有的原始文件都包含此文本.当我从第二行删除 i:type="Record" xmlns="http://schemas.datacontract.org/Storage"
时(保留剩余的文本),我能够得到 accelx
和 accely
值使用下面给出的代码.
Look at the text in second line. All my original files contain this text. When I delete i:type="Record" xmlns="http://schemas.datacontract.org/Storage"
from second line (retaining the remaining text), I am able to get accelx
and accely
values using the code given below.
如何用原文解析这个文件?
How can I parse this file with the original text?
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record" xmlns="http://schemas.datacontract.org/Storage">
<AvailableCharts>
<Accelerometer>true</Accelerometer>
<Velocity>false</Velocity>
</AvailableCharts>
<Trics>
<Trick>
<EndOffset>PT2M21.835S</EndOffset>
<Values>
<TrickValue>
<Acceleration>26.505801694441629</Acceleration>
<Rotation>0.023379150593228679</Rotation>
</TrickValue>
</Values>
</Trick>
</Trics>
<Values>
<SensorValue>
<accelx>-3.593643144</accelx>
<accely>7.316485176</accely>
</SensorValue>
<SensorValue>
<accelx>0.31103436</accelx>
<accely>7.70408184</accely>
</SensorValue>
</Values>
</ArrayOfRecord>
解析数据的代码:
import lxml.etree as etree
tree = etree.parse(r"C:\testdel.xml")
root = tree.getroot()
val_of_interest = root.findall('./Values/SensorValue')
for sensor_val in val_of_interest:
print sensor_val.find('accelx').text
print sensor_val.find('accely').text
我在这里问了相关问题:如何从标签深处的xml文件中提取数据
I asked related question here: How to extract data from xml file that is deep down the tag
谢谢
推荐答案
混淆是由以下默认命名空间(命名空间声明不带前缀)引起的:
The confusion was caused by the following default namespace (namespace declared without prefix) :
xmlns="http://schemas.datacontract.org/Storage"
请注意,没有前缀的后代元素隐式继承祖先的默认命名空间.现在,要引用命名空间中的元素,您需要将前缀映射到命名空间 URI,并在 XPath 中使用该前缀:
Note that descendants elements without prefix inherit default namespace from ancestor, implicitly. Now, to reference element in namespace, you need to map a prefix to the namespace URI, and use that prefix in your XPath :
ns = {'d': 'http://schemas.datacontract.org/Storage' }
val_of_interest = root.findall('./d:Values/d:SensorValue', ns)
for sensor_val in val_of_interest:
print sensor_val.find('d:accelx', ns).text
print sensor_val.find('d:accely', ns).text
这篇关于解析 XML 文件时如何处理 xmlns 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!