python - 如何使用Python将unicode字符串转换为真实字符串

This question already has answers here:

Chinese and Japanese character support in python

                                    （3个答案）


                                4年前关闭。


我已经使用Python通过urllib2获取一些信息，但是该信息是unicode字符串。

我已经尝试过以下方法：

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print unicode(a).encode("gb2312")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.encode("utf-8").decode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print u""+a

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).decode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).encode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.decode("utf-8").encode("gb2312")

但所有结果都相同：

\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728

我想获得以下中文文本：

方法，删除存储在

最佳答案

您需要将string转换为unicode string。

首先，a中的反斜杠是自动转义的：

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"

print a # Prints \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728

a       # Prints '\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'

因此，使用此转义字符串的编码/解码没有区别。

您可以使用unicode literal或将字符串转换为unicode string。

要使用unicode literal，只需在字符串前面添加u：

a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"

要将现有的string转换为unicode string，可以使用unicode作为unicode_escape参数调用encoding：

print unicode(a, encoding='unicode_escape') # Prints 方法，删除存储在

我敢打赌，您是从string响应中获取JSON的，所以第二种方法很可能就是您所需要的。

顺便说一句，unicode_escape编码是Python特定的编码，用于

在Python源代码中生成适合作为Unicode文字的字符串
码