在Python中可靠的处理非ASCII字符的方法

在Python中可靠的处理非ASCII字符的方法

本文介绍了在Python中可靠的处理非ASCII字符的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个电子表格的标题包含非ASCII字符,因此:

I have a column a spreadsheet whose header contains non-ASCII characters thus:

'Campaign'

如果我把这个字符串输入解释器,我会得到:

If I pop this string into the interpreter, I get:

'\xc3\xaf\xc2\xbb\xc2\xbfCampaign'

字符串是 csv.DictReader() c $ c>

The string is one the keys in the rows of a csv.DictReader()

当我尝试使用这个键的 value 填充一个新的dict: p>

When I try to populate a new dict with with the value of this key:

spends['Campaign'] = 2

我得到:

Key Error: '\xc3\xaf\xc2\xbb\xc2\xbfCampaign'

如果我打印行,我可以看到它是'\xef\xbb\xbfCampaign'

If I print the value of the keys of row, I can see that it is '\xef\xbb\xbfCampaign'

只需更新程序即可访问此键:

Obviously then I can just update my program to access this key thus:

spends['\xef\xbb\xbfCampaign']

但是有没有一种更好的方法来做这件事?事实上,如果这个键的值都改变为包含其他非ASCII字符,那么处理任何可能出现的所有非ASCII字符的方法是什么?

But is there a "better" way of doing this, in Python? Indeed, if the value of this key every changes to contain other non-ASCII characters, what is an all-encompassing way of handling any all non-ASCII characters that may arise?

推荐答案

一般来说,应该尽快在输入时使用相应的字符编码将一个字节解码为Unicode文本。反之,将Unicode文本尽可能晚的在输出上编码为一个字节。某些API(例如 io.open()可以隐式执行,因此您的代码只能看到Unicode)。

In general, you should decode a bytestring into Unicode text using the corresponding character encoding as soon as possible on input. And, in reverse, encode Unicode text into a bytestring as late as possible on output. Some APIs such as io.open() can do it implicitly so that your code sees only Unicode.

csv 模块不直接在Python 2上支持Unicode。请参阅 UnicodeReader UnicodeWriter 您可以为 csv.DictReader 创建它们的模拟,或者作为替代方法只是通过utf-8编码bytestrings到 csv 模块。

Unfortunately, csv module does not support Unicode directly on Python 2. See UnicodeReader, UnicodeWriter in the doc examples. You could create their analog for csv.DictReader or as an alternative just pass utf-8 encoded bytestrings to csv module.

这篇关于在Python中可靠的处理非ASCII字符的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!