问题描述
我仍然在学习python,我有疑问:在python 2.6.x我通常在文件头中声明这样的编码(如在)
# - * - 编码:utf-8 - * -
之后,我的字符串是照常写的:
a =没有声明Unicode的普通字符串
但是每次我看到一个python项目代码,编码没有在标题中声明。相反,它会在每个字符串中声明如下:
a = u声明Unicode的字符串
有什么区别?这是什么目的?我知道Python 2.6.x默认设置ASCII编码,但它可以被标题声明覆盖,所以每个字符串声明的意义是什么?
附录:看起来我已经将文件编码与字符串编码混合了。感谢您的解释:)
正如其他人所说,这是两个不同的东西。
当您指定# - * - 编码:utf-8 - * -
你告诉Python你保存的源文件是 utf-8
。 Python 2的默认值是ASCII(对于Python 3,它是 utf-8
)。这只是影响解释器如何读取文件中的字符。
一般来说,无论什么编码,嵌入高的unicode字符可能不是最好的方法是;您可以使用字符串unicode转义,它可以在编码中工作。
当您使用 u
在前面,如 u'This是一个字符串'
,它告诉Python编译器字符串是Unicode,而不是字节。口译员大多透明地处理;最明显的区别是,您现在可以在字符串中嵌入unicode字符(即 u'\\\♥'
现在是合法的)。您可以使用__future__导入unicode_literals 中的将其设置为默认值。
这仅适用于Python 2;在Python 3中,默认是Unicode,您需要在前面指定一个 b
(如 b'这些是字节'
,声明一个字节序列)。
I'm still learning python and I have a doubt:
In python 2.6.x I usually declare encoding in the file header like this (as in PEP 0263)
# -*- coding: utf-8 -*-
After that, my strings are written as usual:
a = "A normal string without declared Unicode"
But everytime I see a python project code, the encoding is not declared at the header. Instead, it is declared at every string like this:
a = u"A string with declared Unicode"
What's the difference? What's the purpose of this? I know Python 2.6.x sets ASCII encoding by default, but it can be overriden by the header declaration, so what's the point of per string declaration?
Addendum: Seems that I've mixed up file encoding with string encoding. Thanks for explaining it :)
Those are two different things, as others have mentioned.
When you specify # -*- coding: utf-8 -*-
, you're telling Python the source file you've saved is utf-8
. The default for Python 2 is ASCII (for Python 3 it's utf-8
). This just affects how the interpreter reads the characters in the file.
In general, it's probably not the best idea to embed high unicode characters into your file no matter what the encoding is; you can use string unicode escapes, which work in either encoding.
When you declare a string with a u
in front, like u'This is a string'
, it tells the Python compiler that the string is Unicode, not bytes. This is handled mostly transparently by the interpreter; the most obvious difference is that you can now embed unicode characters in the string (that is, u'\u2665'
is now legal). You can use from __future__ import unicode_literals
to make it the default.
This only applies to Python 2; in Python 3 the default is Unicode, and you need to specify a b
in front (like b'These are bytes'
, to declare a sequence of bytes).
这篇关于为什么在python中通过字符串声明unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!