问题描述
我一直不确定自己是否理解 str/unicode 解码和编码之间的区别.
我知道 str().decode()
用于当你有一个你知道有特定字符编码的字节串时,给定该编码名称,它将返回一个 unicode 字符串.
我知道 unicode().encode()
根据给定的编码名称将 unicode 字符转换为字节串.
但我不明白 str().encode()
和 unicode().decode()
是干什么用的.任何人都可以解释一下,并可能还纠正我在上面做错的其他任何事情吗?
几个答案提供了关于 .encode
对字符串的作用的信息,但似乎没有人知道 .decode
对 unicode 的作用.
unicode 字符串的 decode
方法真的根本没有任何应用程序(除非你有一些非文本数据在一个unicode 字符串出于某种原因——见下文).我认为这主要是出于历史原因.在 Python 3 中它完全消失了.
unicode().decode()
将使用默认 (ascii) 编解码器对 s
执行隐式编码.像这样验证:
错误信息完全一样.
对于 str().encode()
则相反——它尝试使用默认编码对 s
进行隐式解码:
这样使用,str().encode()
也是多余的.
但是后一种方法的另一个应用程序很有用:有编码 与字符集无关,因此可以以有意义的方式应用于 8 位字符串:
>>>s.encode('zip')'xx9c;xbcx00x02>x01z'您是对的,不过:这两个应用程序中编码"的含糊用法是……很尴尬.同样,在 Python 3 中使用单独的 byte
和 string
类型,这不再是一个问题.
I've never been sure that I understand the difference between str/unicode decode and encode.
I know that str().decode()
is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.
I know that unicode().encode()
converts unicode chars into a string of bytes according to a given encoding name.
But I don't understand what str().encode()
and unicode().decode()
are for. Can anyone explain, and possibly also correct anything else I've gotten wrong above?
EDIT:
Several answers give info on what .encode
does on a string, but no-one seems to know what .decode
does for unicode.
The decode
method of unicode strings really doesn't have any applications at all (unless you have some non-text data in a unicode string for some reason -- see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.
unicode().decode()
will perform an implicit encoding of s
using the default (ascii) codec. Verify this like so:
>>> s = u'ö'
>>> s.decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xf6' in position 0:
ordinal not in range(128)
>>> s.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xf6' in position 0:
ordinal not in range(128)
The error messages are exactly the same.
For str().encode()
it's the other way around -- it attempts an implicit decoding of s
with the default encoding:
>>> s = 'ö'
>>> s.decode('utf-8')
u'xf6'
>>> s.encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
Used like this, str().encode()
is also superfluous.
But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:
>>> s.encode('zip')
'xx9c;xbc
x00x02>x01z'
You are right, though: the ambiguous usage of "encoding" for both these applications is... awkard. Again, with separate byte
and string
types in Python 3, this is no longer an issue.
这篇关于编码/解码之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!