问题描述
我正在使用Solr 3.3索引数据库中的内容.我用Python编写了JSON内容.我设法上传2126条记录,这些记录总计523246个字符(约511kb).但是当我尝试2027条记录时,Python给了我错误:
I am using Solr 3.3 to index stuff from my database. I compose the JSON content in Python. I manage to upload 2126 records which add up to 523246 chars (approx 511kb). But when I try 2027 records, Python gives me the error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "D:\Technovia\db_indexer\solr_update.py", line 69, in upload_service_details
request_string.append(param_list)
File "C:\Python27\lib\json\__init__.py", line 238, in dumps
**kw).encode(obj)
File "C:\Python27\lib\json\encoder.py", line 203, in encode
chunks = list(chunks)
File "C:\Python27\lib\json\encoder.py", line 425, in _iterencode
for chunk in _iterencode_list(o, _current_indent_level):
File "C:\Python27\lib\json\encoder.py", line 326, in _iterencode_list
for chunk in chunks:
File "C:\Python27\lib\json\encoder.py", line 384, in _iterencode_dict
yield _encoder(value)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 68: invalid start byte
太好了. 512kb的字节数是基本限制吗?现有的JSON模块有大量替代品吗?
Ouch. Is 512kb worth of bytes a fundamental limit? Is there any high-volume alternative to the existing JSON module?
更新:这是某些数据的错误,因为尝试对* biz_list [2126:] *进行编码会导致立即错误.这是令人不快的片段:
Update: its a fault of some data as trying to encode *biz_list[2126:]* causes an immediate error. Here is the offending piece:
如何配置它以便可以将其编码为JSON?
How can I configure it so that it can be encodable into JSON?
更新2 :答案按预期进行:数据来自以"latin-1-swedish-ci"编码的MySQL表.我看到了一个随机数的意义.很抱歉,在诊断故障时会自发地传达标题作家的精神.
Update 2: The answer worked as expected: the data came from a MySQL table encoded in "latin-1-swedish-ci". I saw significance in a random number. Sorry for spontaneously channeling the spirit of a headline writer when diagnosing the fault.
推荐答案
简单,如果您的数据不在utf-8中,请不要使用utf-8编码
Simple, just don't use utf-8 encoding if your data is not in utf-8
>>> json.loads('["\x96"]')
....
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 0: invalid start byte
>>> json.loads('["\x96"]', encoding="latin-1")
[u'\x96']
如果s
是str
实例,并使用基于ASCII的编码 编码为utf-8以外的格式(例如latin-1),然后再进行适当的编码 必须指定encoding
名称.非ASCII的编码 不允许(例如UCS-2),并且应将其解码为 unicode
首先.
If s
is a str
instance and is encoded with an ASCII based encoding other than utf-8 (e.g. latin-1) then an appropriate encoding
name must be specified. Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode
first.
编辑:如Eli Collins所述,要获取正确的Unicode值"\ x96",请使用"cp1252"
Edit: To get proper unicode value of "\x96" use "cp1252" as Eli Collins mentioned
>>> json.loads('["\x96"]', encoding="cp1252")
[u'\u2013']
这篇关于为什么在Python的JSON编码中出现UnicodeDecodeError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!