本文介绍了编码给出“'ascii'编解码器无法编码字符......序数不在范围内(128)";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理 Django RSS 阅读器项目 这里.

I am working through the Django RSS reader project here.

RSS 提要会显示类似OKLAHOMA CITY (AP) — James Harden let"的内容.RSS 提要的编码读取 encoding="UTF-8" 所以我相信我在下面的代码片段中将 utf-8 传递给 markdown.em 破折号是它窒息的地方.

The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes.

我收到 Django 错误'ascii' codec can't encode character u'u2014' in position 109: ordinal not in range(128)",这是一个 UnicodeEncodeError.在传递的变量中,我看到OKLAHOMA CITY (AP) u2014 James Harden".不起作用的代码行是:

I get the Django error of "'ascii' codec can't encode character u'u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) u2014 James Harden". The code line that is not working is:

content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

我使用的是 markdown 2.0、django 1.1 和 python 2.4.

I am using markdown 2.0, django 1.1, and python 2.4.

我需要做的编码和解码的神奇序列是什么?

What is the magic sequence of encoding and decoding that I need to do to make this work?

(应普罗米修斯的要求.我同意格式有帮助)

(In response to Prometheus' request. I agree the formatting helps)

所以在视图中,我在 parsed_feed 编码行上方添加了一个 smart_unicode 行...

So in views I add a smart_unicode line above the parsed_feed encoding line...

content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict')
content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

这将问题推到我的models.py上,我有

This pushes the problem to my models.py for me where I have

def save(self, force_insert=False, force_update=False):
     if self.excerpt:
         self.excerpt_html = markdown(self.excerpt)
         # super save after this

如果我将保存方法更改为...

If I change the save method to have...

def save(self, force_insert=False, force_update=False):
     if self.excerpt:
         encoded_excerpt_html = (self.excerpt).encode('utf-8')
         self.excerpt_html = markdown(encoded_excerpt_html)

我收到错误 "'ascii' codec can't decode byte 0xe2 in position 141: ordinal not in range(128)" 因为现在它读取 "xe2x80x94" 其中破折号是

I get the error "'ascii' codec can't decode byte 0xe2 in position 141: ordinal not in range(128)" because now it reads "xe2x80x94" where the em dash was

推荐答案

如果您接收的数据实际上是用 UTF-8 编码的,那么它应该是一个字节序列——一个 Python 'str'对象,在 Python 2.X 中

If the data that you are receiving is, in fact, encoded in UTF-8, then it should be a sequence of bytes -- a Python 'str' object, in Python 2.X

您可以使用断言来验证这一点:

You can verify this with an assertion:

assert isinstance(content, str)

一旦你知道这是真的,你就可以转向实际的编码.Python 不进行转码——例如直接从 UTF-8 到 ASCII.您需要首先通过解码将字节序列转换为 Unicode 字符串:

Once you know that that's true, you can move to the actual encoding. Python doesn't do transcoding -- directly from UTF-8 to ASCII, for instance. You need to first turn your sequence of bytes into a Unicode string, by decoding it:

unicode_content = content.decode('utf-8')

(如果您可以信任 parsed_feed.encoding,则使用它而不是文字utf-8".无论哪种方式,都要为错误做好准备.)

(If you can trust parsed_feed.encoding, then use that instead of the literal 'utf-8'. Either way, be prepared for errors.)

然后您可以获取该字符串,并将其编码为 ASCII,并用它们的 XML 实体等效项替换高位字符:

You can then take that string, and encode it in ASCII, substituting high characters with their XML entity equivalents:

xml_content = unicode_content.encode('ascii', 'xmlcharrefreplace')

然后,完整的方法看起来像这样:

The full method, then, would look somthing like this:

try:
    content = content.decode(parsed_feed.encoding).encode('ascii', 'xmlcharrefreplace')
except UnicodeDecodeError:
    # Couldn't decode the incoming string -- possibly not encoded in utf-8
    # Do something here to report the error

这篇关于编码给出“'ascii'编解码器无法编码字符......序数不在范围内(128)";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 19:01