编解码器无法编码字符

编解码器无法编码字符

本文介绍了UnicodeEncodeError:"latin-1"编解码器无法编码字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试在数据库中插入一个外来字符时,是什么原因导致此错误?

What could be causing this error when I try to insert a foreign character into the database?

>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

该如何解决?

谢谢!

推荐答案

在Latin-1(ISO-8859-1)编码中不存在字符U + 201C左双引号.

Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.

出现在代码页1252(西欧)中.这是Windows特定的编码,基于ISO-8859-1,但会将多余的字符放入0x80-0x9F范围内.代码页1252通常与ISO-8859-1混淆,这是一种令人烦恼但现在是标准的Web浏览器行为,如果您将页面作为ISO-8859-1提供服务,则浏览器会将它们视为cp1252.但是,它们实际上是两种截然不同的编码:

It is present in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:

>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'

如果仅将数据库用作字节存储,则可以使用cp1252对Windows Western代码页中出现的和其他字符进行编码.但是cp1252中不存在的其他Unicode字符仍然会导致错误.

If you are using your database only as a byte store, you can use cp1252 to encode " and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.

您可以使用encode(..., 'ignore')通过消除字符来抑制错误,但是实际上在本世纪,您应该在数据库和页面中都使用UTF-8.此编码允许使用任何字符.理想情况下,您还应该告诉MySQL您正在使用UTF-8字符串(通过在字符串列上设置数据库连接和排序规则),这样它就可以不区分大小写地进行比较和排序.

You can use encode(..., 'ignore') to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.

这篇关于UnicodeEncodeError:"latin-1"编解码器无法编码字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 05:49