问题描述
我有一个字符串,格式为:
I have a string of the form:
s = '\\xe2\\x99\\xac'
我想通过评估转义序列将其转换为字符♬.但是,我尝试过的所有操作都会导致错误或打印出垃圾.如何强制Python将转义序列转换为文字unicode字符?
I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character?
我在其他地方阅读的内容表明,下面的代码行应该可以实现我想要的功能,但是会导致UnicodeEncodeError.
What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError.
print(bytes(s, 'utf-8').decode('unicode-escape'))
我还尝试了以下方法,其结果相同:
I also tried the following, which has the same result:
import codecs
print(codecs.getdecoder('unicode_escape')(s)[0])
这两种方法都产生字符串'âx99',随后打印无法处理.
Both of these approaches produce the string 'â\x99¬', which print is subsequently unable to handle.
万一这有什么区别,那就是从UTF-8编码的文件中读取字符串,并在处理后最终将其输出到另一个UTF-8编码的文件中.
In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.
推荐答案
...decode('unicode-escape')
将为您提供字符串'\xe2\x99\xac'
.
>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape')
'â\x99¬'
>>> _ == '\xe2\x99\xac'
True
您需要对其进行解码.但是要进行解码,请先使用latin1
(或iso-8859-1
)对其进行编码以保留字节.
You need to decode it. But to decode it, encode it first with latin1
(or iso-8859-1
) to preserve the bytes.
>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
'♬'
这篇关于在Python3中评估字符串中的UTF-8文字转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!