问题描述
我有一些xml,其中包含多个具有相同名称的元素,但是每个元素都使用不同的语言,例如:
I have some xml which has multiple elements with the same name, but each is in a different language, for example:
<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>
通常,我会使用元素的属性来检索元素,如下所示:
Normally, I'd retrieve an element using its attributes as follows:
titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap)
例如,如果我尝试使用[@xml:lang ="FR"]进行此操作,则会收到回溯错误:
If I try and do this with [@xml:lang="FR"] (for example), I get the traceback error:
File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module>
titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap)
File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find
it = iterfind(elem, path, namespaces)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind
selector = _build_path_iterator(path, namespaces)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator
selector.append(ops[token[0]](_next, token))
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate
token = next()
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer
raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map
对此我并不感到惊讶,但是我想提出有关如何解决该问题的建议.
I'm not surprised by this, but I'd like suggestions on how to get around the issue.
谢谢!
根据要求,提供了完整但完整的代码集(如果删除[bitsinsquarebrackets],它可以按预期工作):
As requested, a cut-down but complete set of code (It works as expected if I remove the [bitsinsquarebrackets]):
import lxml
import codecs
file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name
print('Parsing ' + file_name)
#----- Sets up import and namespace
from lxml import etree
parser = lxml.etree.XMLParser()
tree = lxml.etree.parse(file_name, parser) # Name of file to test goes here
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2012',
'mpeg7': 'urn:tva:mpeg7:2008'}
#----- This code writes the output to a file
with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f: # Name the output file
f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n')
for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap) # Retreve the title
title = titlex.text if titlex != None else 'Missing' # If there isn't a title, print an alternative word
f.write(u'{}\n'.format(title)) # Write all the retrieved values to the same line with bar seperators and a new line
推荐答案
xml:lang
中的xml
前缀不需要在XML文档中声明,但是如果要在XPath查找中使用xml:lang
,您必须在Python代码中定义一个前缀映射.
The xml
prefix in xml:lang
does not need to be declared in an XML document, but if you want to use xml:lang
in XPath lookups, you have to define a prefix mapping in the Python code.
xml
前缀是保留的(与任意的常规"名称空间前缀相反),并定义为绑定到http://www.w3.org/XML/1998/namespace
.请参阅 XML 1.0中的命名空间 W3C建议.
The xml
prefix is reserved (as opposed to "normal" namespace prefixes which are arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace
. See the Namespaces in XML 1.0 W3C recommendation.
示例:
from lxml import etree
# Required mapping
nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"}
XML = """
<root>
<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>
</root>"""
doc = etree.fromstring(XML)
title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap)
print title_FR.text
输出:
Les Tudors
如果xml
前缀没有映射,则会出现"在前缀映射中找不到前缀'xml'"的错误.如果映射到xml
前缀的URI不是http://www.w3.org/XML/1998/namespace
,则上面代码段中的find
方法将不返回任何内容.
If there is no mapping for the xml
prefix, you get the "prefix 'xml' not found in prefix map" error. If the URI mapped to the xml
prefix is not http://www.w3.org/XML/1998/namespace
, the find
method in the code snippet above does not return anything.
这篇关于Python lxml-使用xml:lang属性检索元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!