本文介绍了如何在Python中打印UTF-8编码的文本到控制台3?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个最近的Linux系统,其中所有的地区都是UTF-8:

  LANG = de_DE.UTF- 8 
LANGUAGE =
LC_CTYPE =de_DE.UTF-8
LC_NUMERIC =de_DE.UTF-8
LC_TIME =de_DE.UTF-8
...
LC_IDENTIFICATION =de_DE.UTF-8
LC_ALL =

现在我想将UTF-8编码的内容写入控制台。



现在,Python使用UTF-8作为FS编码,但是粘贴到ASCII为默认编码: - (

 >>>导入sys 
>> ; sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
pre>

我认为最好的(干净的)方法是设置 PYTHONIOENCODING 环境变量,但似乎至少在我的系统上,即使在设置了 envvar 之后,我仍然将 ascii 作为默认编码。

 #t在〜/ .bashrc和〜/ .profile(也源自它们)
#和命令行之前运行python
export PYTHONIOENCODING = UTF-8

如果我在脚本开始时执行以下操作:

 >>>导入sys 
>>>> reload(sys)#再次启用`setdefaultencoding'
< module'sys'(built-in)>
>>> sys.setdefaultencoding(UTF-8)
>>> sys.getdefaultencoding()
'UTF-8'

但是,不洁即可。那么,实现这一点的好方法是什么?



解决方法



而不是更改默认编码 - 这是不是一个好主意(请参阅mesilliac的答案) - 我只是用 StreamWriter sys.stdout c>像这样:

  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

请参阅。 envvar正确的区域设置和/或 PYTHONIOENCODING envvar就足够了。另外,如果您需要替换 sys.stdout 然后喜欢 。



sys.getdefaultencoding()与您的区域设置无关,而
PYTHONIOENCODING 。您假设设置 PYTHONIOENCODING
应该更改 sys.getdefaultencoding()不正确。您应该
检查 sys.stdout.encoding



sys。当您打印到
控制台时,不会使用getdefaultencoding()
。如果stdout是
重定向到文件/管道,则可以将其用作Python 2的后备程序,除非 PYTHOHIOENCODING 设置为:

  $ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)'| cat

$ PYTHONIOENCODING = utf8 python2 -c'import sys; print(sys.stdout.encoding)'| cat
utf8

不要调用 sys.setdefaultencoding(UTF -8\" );它可能会无声地损坏您的
数据和/或破坏不期望
的第三方模块。记住 sys.getdefaultencoding()用于将bytestrings
str )转换为/ code> unicode in Python 2 隐式例如,a+ ub。另请参阅


I'm running a recent Linux system where all my locales are UTF-8:

LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
...
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

Now I want to write UTF-8 encoded content to the console.

Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'

I thought the best (clean) way to do this was setting the PYTHONIOENCODING environment variable. But it seems that Python ignores it. At least on my system I keep getting ascii as default encoding, even after setting the envvar.

# tried this in ~/.bashrc and ~/.profile (also sourced them)
# and on the commandline before running python
export PYTHONIOENCODING=UTF-8

If I do the following at the start of a script, it works though:

>>> import sys
>>> reload(sys)  # to enable `setdefaultencoding` again
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("UTF-8")
>>> sys.getdefaultencoding()
'UTF-8'

But that approach seems unclean. So, what's a good way to accomplish this?

Workaround

Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap sys.stdout with a StreamWriter like this:

sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

See this gist for a small utility function, that handles it.

解决方案
print u"some unicode text \N{EURO SIGN}"
print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')

i.e., if you have a Unicode string then print it directly. If you havea bytestring then convert it to Unicode first.

Your locale settings (LANG, LC_CTYPE) indicate a utf-8 locale andtherefore (in theory) you could print a utf-8 bytestring directly and itshould be displayed correctly in your terminal (if terminal settingsare consistent with the locale settings and they should be) but youshould avoid it: do not hardcode the character encoding of yourenvironment inside your script; print Unicode directly instead.

There are many wrong assumptions in your question.

You do not need to set PYTHONIOENCODING with your locale settings,to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.

You do not need the workaround sys.stdout =codecs.getwriter(locale.getpreferredencoding())(sys.stdout). It maybreak if some code (that you do not control) does need to print bytesand/or it may break whileprinting Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING envvar are enough. Also, if you need to replace sys.stdout then use io.TextIOWrapper() instead of codecs module like win-unicode-console package does.

sys.getdefaultencoding() is unrelated to your locale settings and toPYTHONIOENCODING. Your assumption that setting PYTHONIOENCODINGshould change sys.getdefaultencoding() is incorrect. You shouldcheck sys.stdout.encoding instead.

sys.getdefaultencoding() is not used when you print to theconsole. It may be used as a fallback on Python 2 if stdout isredirected to a file/pipe unless PYTHOHIOENCODING is set:

$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8

Do not call sys.setdefaultencoding("UTF-8"); it may corrupt yourdata silently and/or break 3rd-party modules that do not expectit. Remember sys.getdefaultencoding() is used to convert bytestrings(str) to/from unicode in Python 2 implicitly e.g., "a" + u"b". See also,the quote in @mesilliac's answer.

这篇关于如何在Python中打印UTF-8编码的文本到控制台3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 02:30
查看更多