问题描述
我正在使用 utf-8 编码的应用程序.出于调试目的,我需要打印文本.如果我直接将 print()
与包含我的unicode字符串的变量一起使用,则ex- print(pred_str)
.
I'm working on an application which is using utf-8 encoding. For debugging purposes I need to print the text. If I use print()
directly with variable containing my unicode string, ex- print(pred_str)
.
我收到此错误:
所以我尝试了 print(pred_str.encode('utf-8'))
,我的输出看起来像这样:
So I tried print(pred_str.encode('utf-8'))
and my output looks like this:
b'\ xef \ xbb \ xbfpudgala-dharma-nair \ xc4 \ x81tmyayo \ xe1 \ xb8 \ xa5 apratipanna-vipratipann \ xc4 \ x81n \ xc4 \ x81m'b'avipar \ xc4 \ xabta-pudgala-dharma-nair \ xc4 \ x81tmya-pratip \ xc4 \ x81dana-artham'b'tri \ xe1 \ xb9 \ x83 \ xc5 \ x9bik \ xc4 \ x81-vij \ xc3 \ xb1apti-prakara \ xe1 \ xb9 \ x87a- \ xc4 \ x81rambha \ xe1 \ xb8 \ xa5'b'pudgala-dharma-air \ xc4 \ x81tmya-pratip \ xc4 \ x81danam punar kle \ xc5 \ x9ba-j \ xc3 \ xb1eya- \ xc4 \ x81vara \ xe1 \ xb9 \ x87a-prah \ xc4 \ x81 \ xe1 \\ x87a-artham'
但是,我希望我的输出看起来像这样:
But, I want my output to look like this:
pudgala-dharma-nairātmyayoḥapratipanna-vipratipannānām阿维帕里塔-普达加拉-佛法-奈拉特米-普拉蒂达达纳-阿瑟姆triṃśikā-vijñapti-prakaraṇa-ārambhaḥpudgala-dharma-nairātmya-pratipādanampunarkleśa-jñeya-āvaraṇa-prahāṇa-artham
如果我使用以下方式将字符串保存在文件中:
If i save my string in file using:
with codecs.open('out.txt', 'w', 'UTF-8') as f:
f.write(pred_str)
它按预期方式保存了字符串.
it saves string as expected.
推荐答案
您的数据使用"UTF-8-SIG"编解码器编码,有时在Microsoft环境中使用.
Your data is encoded with the "UTF-8-SIG" codec, which is sometimes used in Microsoft environments.
此UTF-8变体在编码文本前加上字节顺序标记 '\ xef \ xbb \ xbf'
,使应用程序更容易检测UTF-8编码的文本和其他编码.
This variant of UTF-8 prefixes encoded text with a byte order mark '\xef\xbb\xbf'
, to make it easier for applications to detect UTF-8 encoded text vs other encodings.
您可以像这样解码这样的字节串:
You can decode such bytestrings like this:
>>> bs = b'\xef\xbb\xbfpudgala-dharma-nair\xc4\x81tmyayo\xe1\xb8\xa5 apratipanna-vipratipann\xc4\x81n\xc4\x81m'
>>> text = bs.decode('utf-8-sig')
>>> print(text)
pudgala-dharma-nairātmyayoḥ apratipanna-vipratipannānām
要从文件中读取此类数据:
To read such data from a file:
with open('myfile.txt', 'r', encoding='utf-8-sig') as f:
text = f.read()
请注意,即使从UTF-8-SIG解码后,您仍可能无法打印数据,因为控制台的默认代码页可能无法对数据中的其他非ASCII字符进行编码.在这种情况下,您需要调整控制台设置以支持UTF-8.
Note that even after decoding from UTF-8-SIG, you may still be unable to print your data because your console's default code page may not be able to encode other non-ascii characters in the data. In that case you will need to adjust your console settings to support UTF-8.
这篇关于UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\ ufeff':字符映射到< undefined>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!