问题描述
我正在运行一个最近的Linux系统,其中所有的地区都是UTF-8: LANG = de_DE.UTF- 8
LANGUAGE =
LC_CTYPE =de_DE.UTF-8
LC_NUMERIC =de_DE.UTF-8
LC_TIME =de_DE.UTF-8
...
LC_IDENTIFICATION =de_DE.UTF-8
LC_ALL =
现在我想将UTF-8编码的内容写入控制台。
现在,Python使用UTF-8作为FS编码,但是粘贴到ASCII为默认编码: - (
>>>导入sys
pre>
>> ; sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
我认为最好的(干净的)方法是设置
PYTHONIOENCODING
环境变量,但似乎至少在我的系统上,即使在设置了 envvar 之后,我仍然将ascii
作为默认编码。#t在〜/ .bashrc和〜/ .profile(也源自它们)
#和命令行之前运行python
export PYTHONIOENCODING = UTF-8
如果我在脚本开始时执行以下操作:
>>>导入sys
>>>> reload(sys)#再次启用`setdefaultencoding'
< module'sys'(built-in)>
>>> sys.setdefaultencoding(UTF-8)
>>> sys.getdefaultencoding()
'UTF-8'
但是,不洁即可。那么,实现这一点的好方法是什么?
解决方法
而不是更改默认编码 - 这是不是一个好主意(请参阅mesilliac的答案) - 我只是用
StreamWriter sys.stdout
c>像这样:sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
请参阅。 envvar正确的区域设置和/或
PYTHONIOENCODING
envvar就足够了。另外,如果您需要替换sys.stdout
然后喜欢 。
sys.getdefaultencoding()
与您的区域设置无关,而
PYTHONIOENCODING
。您假设设置PYTHONIOENCODING
应该更改sys.getdefaultencoding()
不正确。您应该
检查sys.stdout.encoding
。
sys。当您打印到
。如果stdout是
控制台时,不会使用getdefaultencoding()
重定向到文件/管道,则可以将其用作Python 2的后备程序,除非PYTHOHIOENCODING
设置为:$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)'| cat
无
$ PYTHONIOENCODING = utf8 python2 -c'import sys; print(sys.stdout.encoding)'| cat
utf8
不要调用
sys.setdefaultencoding(UTF -8\" )
;它可能会无声地损坏您的
数据和/或破坏不期望
的第三方模块。记住sys.getdefaultencoding()
用于将bytestrings
(str
)转换为/ code> unicode in Python 2 隐式例如,a+ ub
。另请参阅
。I'm running a recent Linux system where all my locales are UTF-8:
LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" ... LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL=
Now I want to write UTF-8 encoded content to the console.
Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(
>>> import sys >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'UTF-8'
I thought the best (clean) way to do this was setting the
PYTHONIOENCODING
environment variable. But it seems that Python ignores it. At least on my system I keep gettingascii
as default encoding, even after setting the envvar.# tried this in ~/.bashrc and ~/.profile (also sourced them) # and on the commandline before running python export PYTHONIOENCODING=UTF-8
If I do the following at the start of a script, it works though:
>>> import sys >>> reload(sys) # to enable `setdefaultencoding` again <module 'sys' (built-in)> >>> sys.setdefaultencoding("UTF-8") >>> sys.getdefaultencoding() 'UTF-8'
But that approach seems unclean. So, what's a good way to accomplish this?
Workaround
Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap
sys.stdout
with aStreamWriter
like this:sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
See this gist for a small utility function, that handles it.
解决方案print u"some unicode text \N{EURO SIGN}" print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')
i.e., if you have a Unicode string then print it directly. If you havea bytestring then convert it to Unicode first.
Your locale settings (
LANG
,LC_CTYPE
) indicate a utf-8 locale andtherefore (in theory) you could print a utf-8 bytestring directly and itshould be displayed correctly in your terminal (if terminal settingsare consistent with the locale settings and they should be) but youshould avoid it: do not hardcode the character encoding of yourenvironment inside your script; print Unicode directly instead.There are many wrong assumptions in your question.
You do not need to set
PYTHONIOENCODING
with your locale settings,to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.You do not need the workaround
sys.stdout =codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
. It maybreak if some code (that you do not control) does need to print bytesand/or it may break whileprinting Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/orPYTHONIOENCODING
envvar are enough. Also, if you need to replacesys.stdout
then useio.TextIOWrapper()
instead ofcodecs
module likewin-unicode-console
package does.
sys.getdefaultencoding()
is unrelated to your locale settings and toPYTHONIOENCODING
. Your assumption that settingPYTHONIOENCODING
should changesys.getdefaultencoding()
is incorrect. You shouldchecksys.stdout.encoding
instead.
sys.getdefaultencoding()
is not used when you print to theconsole. It may be used as a fallback on Python 2 if stdout isredirected to a file/pipe unlessPYTHOHIOENCODING
is set:$ python2 -c'import sys; print(sys.stdout.encoding)' UTF-8 $ python2 -c'import sys; print(sys.stdout.encoding)' | cat None $ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat utf8
Do not call
sys.setdefaultencoding("UTF-8")
; it may corrupt yourdata silently and/or break 3rd-party modules that do not expectit. Remembersys.getdefaultencoding()
is used to convert bytestrings(str
) to/fromunicode
in Python 2 implicitly e.g.,"a" + u"b"
. See also,the quote in @mesilliac's answer.这篇关于如何在Python中打印UTF-8编码的文本到控制台3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!