问题描述
< data>
< products>
< color>fumè< / color>
< / product>
< / data>
我尝试使用以下代码生成ElementTree的一个实例:
string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))
我收到以下错误:
UnicodeEncodeError:'ascii'编解码器无法对位置185中的字符u'\xe8进行编码:序号不在范围(128)
pre>
(注意:位置不准确,我从较大的样本中抽取xml)。
如何解决?谢谢
解决方案您不需要解码XML,以供ElementTree工作。 XML携带自己的编码信息(默认为UTF-8),ElementTree为您提供工作,输出unicode:
> >> data ='''\
...< data>
...< products>
...< color>fumè< / color>
...< / products>
...< / data>
...'''
>>> x = ElementTree.fromstring(data)
>>> x [0] [0] .text
u'fum\xe8'
如果您的数据包含在一个文件(如)对象中,只需将文件或文件对象直接传递到
ElementTree.parse()
函数:x = ElementTree.parse('file.xml')
I have this char in an xml file:
<data> <products> <color>fumè</color> </product> </data>
I try to generate an instance of ElementTree with the following code:
string_data = open('file.xml') x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))
and I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 185: ordinal not in range(128)
(NOTE: The position is not exact, I sampled the xml from a larger one).
How to solve it? Thanks
解决方案You do not need to decode XML for ElementTree to work. XML carries it's own encoding information (defaulting to UTF-8) and ElementTree does the work for you, outputting unicode:
>>> data = '''\ ... <data> ... <products> ... <color>fumè</color> ... </products> ... </data> ... ''' >>> x = ElementTree.fromstring(data) >>> x[0][0].text u'fum\xe8'
If your data is contained in a file(like) object, just pass the filename or file object directly to the
ElementTree.parse()
function:x = ElementTree.parse('file.xml')
这篇关于ElementTree和unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!