问题描述
stackoverflow上有一些线程,但我整体上找不到有效的解决方案。
There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.
我收集了大量文本数据从urllib读取函数,并将其存储在pickle文件中。
I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.
现在,我想将此数据写入文件。
在编写时遇到类似-
Now I want to write this data to a file.While writing i'm getting errors similar to -
'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
,正在处理大量数据
我想从urllib读取的数据是字节数据
I suppose the data off the urllib read is byte data
我尝试过
1. text=text.decode('ascii','ignore')
2. s=filter(lambda x: x in string.printable, s)
3. text=u''+text
text=text.decode().encode('utf-8')
但仍然以类似的错误结束。
有人可以指出一个适当的解决方案。
而且编解码器也会剥离工作。
如果冲突字节未作为字符串写入文件,因此没有损失,我没有问题。
but still im ending up with similar errors.Can somebody point out a proper solution.And also would codecs strip work.I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.
推荐答案
您可以通过 Django
模块的 smart_str
来实现。只需尝试以下操作即可:
You can do it through smart_str
of Django
module. Just try this:
from django.utils.encoding import smart_str, smart_unicode
text = u'\u2019'
print smart_str(text)
您可以通过以下命令安装Django具有管理员权限的shell并运行以下命令:
You can install Django by starting a command shell with administrator privileges and run this command:
pip install Django
这篇关于'ascii'编解码器无法在位置* ord不在范围内的字符编码(128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!