问题描述
我有一个包含"\ xaf","\ xbe"等字符的文本,据我从是ASCII编码字符.
I have a text which contains characters such as "\xaf", "\xbe", which, as I understand it from this question, are ASCII encoded characters.
我想将它们在Python中转换为等效的UTF-8.通常的string.encode("utf-8")
抛出UnicodeDecodeError
.有没有更好的方法,例如,使用codecs
标准库?
I want to convert them in Python to their UTF-8 equivalents. The usual string.encode("utf-8")
throws UnicodeDecodeError
. Is there some better way, e.g., with the codecs
standard library?
示例此处有200个字符.
推荐答案
您的文件已经是UTF-8编码的文件.
Your file is already a UTF-8 encoded file.
# saved encoding-sample to /tmp/encoding-sample
import codecs
fp= codecs.open("/tmp/encoding-sample", "r", "utf8")
data= fp.read()
import unicodedata as ud
chars= sorted(set(data))
for char in chars:
try:
charname= ud.name(char)
except ValueError:
charname= "<unknown>"
sys.stdout.write("char U%04x %s\n" % (ord(char), charname))
并手动填写未知名称:
char U000a LINE FEED
char U001e信息分隔符两个
char U001f信息分隔符一
And manually filling in the unknown names:
char U000a LINE FEED
char U001e INFORMATION SEPARATOR TWO
char U001f INFORMATION SEPARATOR ONE
这篇关于如何在Python中将\ xXY编码的字符转换为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!