问题描述
能否详细解释一下Python中字节串和Unicode串的区别.我已阅读这个:
字节码只是将源代码转换成字节数组
这是否意味着 Python 有自己的编码/编码格式?还是使用操作系统设置?我不明白.你能解释一下吗?谢谢!
没有 python 不使用自己的编码.它将使用它有权访问的任何编码并且您指定.str
中的一个字符代表一个 unicode 字符.然而,为了表示超过 256 个字符,单个 unicode 编码使用每个字符超过一个字节来表示许多字符.bytearray
对象使您可以访问底层字节.str
对象具有 encode
方法,该方法采用表示编码的字符串并返回表示该编码中的字符串的 bytearray
对象.bytearray
对象具有 decode
方法,该方法接受一个表示编码的字符串并返回由解释 bytearray
str> 作为以给定编码编码的字符串.这是一个例子.
我们可以看到 UTF-8 使用四个字节,\xce、\xb1、\xce 和 \xac 来表示两个字符.在 Ignacio Vazquez-Abrams 提到的 Spolsky 文章之后,我会阅读 Python Unicode Howto.
Could you explain in detail what the difference is between byte string and Unicode string in Python. I have read this:
Does it mean that Python has its own coding/encoding format? Or does it use the operation system settings?I don't understand. Could you please explain?Thank you!
No python does not use its own encoding. It will use any encoding that it has access to and that you specify. A character in a str
represents one unicode character. However to represent more than 256 characters, individual unicode encodings use more than one byte per character to represent many characters. bytearray
objects give you access to the underlaying bytes. str
objects have the encode
method that takes a string representing an encoding and returns the bytearray
object that represents the string in that encoding. bytearray
objects have the decode
method that takes a string representing an encoding and returns the str
that results from interpreting the bytearray
as a string encoded in the the given encoding. Here's an example.
>>> a = "αά".encode('utf-8')
>>> a
b'\xce\xb1\xce\xac'
>>> a.decode('utf-8')
'αά'
We can see that UTF-8 is using four bytes, \xce, \xb1, \xce, and \xac to represent two characters. After the Spolsky article that Ignacio Vazquez-Abrams referred to, I would read the Python Unicode Howto.
这篇关于字节字符串与 Unicode 字符串.Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!