在Python3中评估字符串中的UTF-8文字转义序列

本文介绍了在Python3中评估字符串中的UTF-8文字转义序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符串，格式为:

I have a string of the form:

s = '\\xe2\\x99\\xac'

我想通过评估转义序列将其转换为字符♬.但是，我尝试过的所有操作都会导致错误或打印出垃圾.如何强制Python将转义序列转换为文字unicode字符?

I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character?

我在其他地方阅读的内容表明，下面的代码行应该可以实现我想要的功能，但是会导致UnicodeEncodeError.

What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError.

print(bytes(s, 'utf-8').decode('unicode-escape'))

我还尝试了以下方法，其结果相同:

I also tried the following, which has the same result:

import codecs
print(codecs.getdecoder('unicode_escape')(s)[0])

这两种方法都产生字符串'âx99'，随后打印无法处理.

Both of these approaches produce the string 'â\x99¬', which print is subsequently unable to handle.

万一这有什么区别，那就是从UTF-8编码的文件中读取字符串，并在处理后最终将其输出到另一个UTF-8编码的文件中.

In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.

推荐答案

...decode('unicode-escape')将为您提供字符串'\xe2\x99\xac'.

>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape')
'â\x99¬'
>>> _ == '\xe2\x99\xac'
True

您需要对其进行解码.但是要进行解码，请先使用latin1(或iso-8859-1)对其进行编码以保留字节.

You need to decode it. But to decode it, encode it first with latin1 (or iso-8859-1) to preserve the bytes.

>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
'♬'

这篇关于在Python3中评估字符串中的UTF-8文字转义序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！