I have a string which is automatically converted to byte code by my IDE (very old Boa Constructor).Now I want to convert it to unicode in order to print it with the encoding on the specific machine (cp1252 on windows or utf-8 on Linux).I use two different ways. One of them is working the other one is not working. But why?Here the working version:#!/usr/bin/python# vim: set fileencoding=cp1252 :str = '\x80'str = str.decode('cp1252') # to unicodestr = str.encode('cp1252') # to strprint strHere the not working version:#!/usr/bin/python# vim: set fileencoding=cp1252 :str = u'\x80'#str = str.decode('cp1252') # to unicodestr = str.encode('cp1252') # to strprint strIn version 1 I convert the str to unicode via the decode function.In version 2 I convert the str to unicode via the u in front of the string.But I thought, the two versions would do exactly the same? 解决方案 str.decode is not just prepending u to the string literal. It translates bytes of input string to meaningful characters (i.e. Unicode).Then you are calling encode to convert this characters to bytes, since you need to "print", output them to the terminal or any other OS entity (like GUI window).So, about your specific task, I believe you want something like:s = '\x80'print s.decode('cp1251').encode(platform_encoding)where 'cp1251' is encoding of your IDE, and platform_encoding is a variable with encoding of current system.In the reply to your comment:This is incorrect assumption. From Defining Python Source Code EncodingsSo set fileencoding=cp1252 just tells the interpreter how to convert characters [you entered via editor] to bytes when it parses line str = '\x80'. This information is not used during str.decode calls.Also you are asking, what u'\x80' is? \x80 is simply interpretered as \u0080, and this is obviously not what you want. Take a look on this question - Bytes in a unicode Python string. 这篇关于UnicodeEncodeError:"charmap"编解码器无法在位置0编码字符"\ x80":字符映射到< undefined>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-04 13:20
查看更多