问题描述
我知道这可行:
a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print(a) # 方法,删除存储在
但是如果我有一个不以 u开头的JSON文件中的字符串( a = \u65b9\u6cd5\uff0c\u5220\u9664\u5b58 50u50a8\u5728
),我知道如何在Python 2中制作它( print unicode(a,encoding ='unicode_escape')#打印方法,删除存储在
)。但是,如何使用Python 3做到这一点呢?
But if I have a string from a JSON file which does not start with "u"(a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
), I know how to make it in Python 2 (print unicode(a, encoding='unicode_escape') # Prints 方法,删除存储在
). But how to do it with Python 3?
类似地,如果它是从文件中加载的字节字符串,则如何转换呢?
Similarly, if it's a byte string loaded from a file, how to convert it?
print("好的".encode("utf-8")) # b'\xe5\xa5\xbd\xe7\x9a\x84'
# how to convert this?
b = '\xe5\xa5\xbd\xe7\x9a\x84' # 好的
推荐答案
如果我正确理解,该文件将包含文字文本 \u65b9\u6cd5\uff0c\u5220\u9664\ 5u5b58\u50a8\u5728
(所以它是纯ASCII码,但带有反斜杠,并且所有描述Unicode序号的方式都与在Python str 文字)。如果是这样,有两种方法可以解决此问题:
If I understand correctly, the file contains the literal text
\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
(so it's plain ASCII, but with backslashes and all that describe the Unicode ordinals the same way you would in a Python str
literal). If so, there are two ways to handle this:
- 以二进制模式读取文件,然后调用
mystr = mybytes.decode('unicode-escape')
从bytes
转换为str
解释转义 - 以文本模式读取文件,并使用
codecs
模块进行文本->文本转换(字节现在,仅codecs
模块功能支持字节到文本和文本到文本的编解码器;bytes.decode
仅用于字节文本和str.encode
纯粹是文本到字节,因为通常在Py2中,str.encode
和unicode.decode
是一个错误,删除危险的方法可以使您更容易理解转换的方向。decodedstr = codecs.decode(encodedstr,'unicode-escape')
Read the file in binary mode, then call
mystr = mybytes.decode('unicode-escape')
to convert from thebytes
tostr
interpreting the escapesRead the file in text mode, and use the
codecs
module for the "text -> text" conversion (bytes to bytes and text to text codecs are now supported only by thecodecs
module functions;bytes.decode
is purely for bytes to text andstr.encode
is purely for text to bytes, because usually, in Py2,str.encode
andunicode.decode
was a mistake, and removing the dangerous methods makes it easier to understand what direction the conversions are supposed to go), e.g.decodedstr = codecs.decode(encodedstr, 'unicode-escape')
这篇关于如何在Python 3中将字符串转换为unicode /字节字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!