问题描述
在Python 2.7中,如何将latin1字符串转换为UTF-8。
In Python 2.7, how do you convert a latin1 string to UTF-8.
例如,我正在尝试将é转换为utf-8。 / p>
For example, I'm trying to convert é to utf-8.
>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
é
字母是é,这是LATIN小写字母E与ACUTE(U + 00E9)
UTF-8字节编码为:c3a9
拉丁字节编码为:e9
The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9)The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9
如何获得一个拉丁字符串的UTF-8编码版本?有人可以举例说明如何转换é?
How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?
推荐答案
要解码从拉丁1到Unicode的字节序列,请使用:
To decode a byte sequence from latin 1 to Unicode, use the .decode()
method:
>>> '\xe9'.decode('latin1')
u'\xe9'
Python使用 \xab
转义为 \\\ÿ
之下的unicode代码点。
Python uses \xab
escapes for unicode codepoints below \u00ff
.
>>> '\xe9'.decode('latin1') == u'\u00e9'
True
上述Latin-1字符可以编码为UTF-8,如下所示:
The above Latin-1 character can be encoded to UTF-8 as:
>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'
这篇关于Python将latin1转换为UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!