问题描述
我是一个非常新的python程序员,正在编写我的第一个脚本.该脚本从plist字符串中提取文本,然后执行一些操作,然后将其打包为HTML电子邮件.
I'm a very new python programmer, working on my first script. the script pulls in text from a plist string, then does some things to it, then packages it up as an HTML email.
从一些条目中,我得到了可怕的Unicode外部序数128"错误.
from a few of the entries, I'm getting the dreaded Unicode "outside ordinal 128" error.
我已经了解了很多有关编码和解码的知识,我知道获得编码对我很重要,但是我很难理解何时或如何进行编码.
Having read as much as I can find about encoding, and decoding, I know that it is important for me to get the encoded, but I'm having a difficult time understanding when or how exactly to do this.
首先使用plistlib提取有问题的变量,然后将其从markdown转换为HTML,如下所示:
The offending variable is first pulled in using plistlib, and converted to HTML from markdown, like this:
entry = result['Entry Text']
donotecontent = markdown2.markdown(entry)
后来,它被像这样放在电子邮件中:
Later, it is put in the email like this:
html = donotecontent + '<br /><br />' + var3
part1 = MIMEText(html, 'html')
msg.attach(part1)
我的问题是,确保该内容中的Unicode字符不会导致此错误的最佳方法是什么?我不想忽略这些字符.
My question is, what is the best way for me to make sure that Unicode characters in this content doesn't cause this to throw an error. I prefer not to ignore the characters.
推荐答案
对不起,我的英语不好.我说中文/日语,每天都使用CJK字符.Ceron几乎解决了这个问题,因此我不再谈论如何使用encode()
/decode()
.
Sorry for my broken english. I am speaking Chinese/Japanese, and using CJK characters everyday.Ceron solved almost of this problem, thus I won't talk about how to use encode()
/decode()
again.
当我们使用str()
强制转换任何unicode对象时,它将把unicode字符串编码为bytedata;当我们使用unicode()
投射str
对象时,它将字节数据解码为Unicode字符.
When we use str()
to cast any unicode object, it will encode unicode string to bytedata; when we use unicode()
to cast str
object, it will decode bytedata to unicode character.
而且,编码必须是从sys.getdefaultencoding()
返回的内容.
And, the encoding must be what returned from sys.getdefaultencoding()
.
默认情况下,sys.getdefaultencoding()
默认情况下返回'ascii',执行str()
/unicode()
强制转换时可能会引发编码/解码异常.
In default, sys.getdefaultencoding()
return 'ascii' by default, the encoding/decoding exception may be thrown when doing str()
/unicode()
casting.
如果要通过str()
或unicode()
进行str<-> unicode转换,以及使用'utf-8'进行隐式编码/解码,则可以执行以下语句:
If you want to do str <-> unicode conversion by str()
or unicode()
, and also, implicity encoding/decoding with 'utf-8', you can execute the following statement:
import sys # sys.setdefaultencoding is cancelled by site.py
reload(sys) # to re-enable sys.setdefaultencoding()
sys.setdefaultencoding('utf-8')
,它将导致以后执行str()
和unicode()
转换任何编码为utf-8的基本字符串对象.
and it will cause later execution of str()
and unicode()
convert any basestring object with encoding utf-8.
但是,我宁愿显式使用encode()
/decode()
,因为它使我的代码维护更加容易.
However, I would prefer to use encode()
/decode()
explicitly, because it makes code maintenance easier for me.
这篇关于将Unicode转换为python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!