将Unicode转换为python

本文介绍了将Unicode转换为python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是一个非常新的python程序员，正在编写我的第一个脚本.该脚本从plist字符串中提取文本，然后执行一些操作，然后将其打包为HTML电子邮件.

I'm a very new python programmer, working on my first script. the script pulls in text from a plist string, then does some things to it, then packages it up as an HTML email.

从一些条目中，我得到了可怕的Unicode外部序数128"错误.

from a few of the entries, I'm getting the dreaded Unicode "outside ordinal 128" error.

我已经了解了很多有关编码和解码的知识，我知道获得编码对我很重要，但是我很难理解何时或如何进行编码.

Having read as much as I can find about encoding, and decoding, I know that it is important for me to get the encoded, but I'm having a difficult time understanding when or how exactly to do this.

首先使用plistlib提取有问题的变量，然后将其从markdown转换为HTML，如下所示:

The offending variable is first pulled in using plistlib, and converted to HTML from markdown, like this:

entry = result['Entry Text']
donotecontent = markdown2.markdown(entry)

后来，它被像这样放在电子邮件中:

Later, it is put in the email like this:

html = donotecontent + '<br /><br />' + var3
part1 = MIMEText(html, 'html')
msg.attach(part1)

我的问题是，确保该内容中的Unicode字符不会导致此错误的最佳方法是什么?我不想忽略这些字符.

My question is, what is the best way for me to make sure that Unicode characters in this content doesn't cause this to throw an error. I prefer not to ignore the characters.

推荐答案

对不起，我的英语不好.我说中文/日语，每天都使用CJK字符.Ceron几乎解决了这个问题，因此我不再谈论如何使用encode()/decode().

Sorry for my broken english. I am speaking Chinese/Japanese, and using CJK characters everyday.Ceron solved almost of this problem, thus I won't talk about how to use encode()/decode() again.

当我们使用str()强制转换任何unicode对象时，它将把unicode字符串编码为bytedata；当我们使用unicode()投射str对象时，它将字节数据解码为Unicode字符.

When we use str() to cast any unicode object, it will encode unicode string to bytedata; when we use unicode() to cast str object, it will decode bytedata to unicode character.

而且，编码必须是从sys.getdefaultencoding()返回的内容.

And, the encoding must be what returned from sys.getdefaultencoding().

默认情况下，sys.getdefaultencoding()默认情况下返回'ascii'，执行str()/unicode()强制转换时可能会引发编码/解码异常.

In default, sys.getdefaultencoding() return 'ascii' by default, the encoding/decoding exception may be thrown when doing str()/unicode() casting.

如果要通过str()或unicode()进行str<-> unicode转换，以及使用'utf-8'进行隐式编码/解码，则可以执行以下语句:

If you want to do str <-> unicode conversion by str() or unicode(), and also, implicity encoding/decoding with 'utf-8', you can execute the following statement:

import sys    # sys.setdefaultencoding is cancelled by site.py
reload(sys)    # to re-enable sys.setdefaultencoding()
sys.setdefaultencoding('utf-8')

，它将导致以后执行str()和unicode()转换任何编码为utf-8的基本字符串对象.

and it will cause later execution of str() and unicode() convert any basestring object with encoding utf-8.

但是，我宁愿显式使用encode()/decode()，因为它使我的代码维护更加容易.

However, I would prefer to use encode()/decode() explicitly, because it makes code maintenance easier for me.

这篇关于将Unicode转换为python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

ByteData