问题描述
我在 Python 中工作,想读取 Unicode 格式的用户输入(从命令行),即 raw_input
的 Unicode 等价物?
另外,我想测试 Unicode 字符串的相等性,看起来标准的 ==
不起作用.
raw_input()
返回由操作系统或 UI 设施编码的字符串.困难在于知道哪个是解码.您可以尝试以下操作:
import sys, localetext= raw_input().decode(sys.stdin.encoding 或 locale.getpreferredencoding(True))
在大多数情况下应该可以正常工作.
我们需要更多有关无法进行 Unicode 比较的数据才能为您提供帮助.但是,这可能是规范化的问题.考虑以下几点:
>>>a1=你'\xeatre'>>>a2= u'e\u0302tre'a1
和 a2
等价但不等价:
所以你可能想使用 unicodedata.normalize()
方法:
如果您向我们提供更多信息,我们或许可以为您提供更多帮助.
I work in Python and would like to read user input (from command line) in Unicode format, ie a Unicode equivalent of raw_input
?
Also, I would like to test Unicode strings for equality and it looks like a standard ==
does not work.
raw_input()
returns strings as encoded by the OS or UI facilities. The difficulty is knowing which is that decoding. You might attempt the following:
import sys, locale
text= raw_input().decode(sys.stdin.encoding or locale.getpreferredencoding(True))
which should work correctly in most of the cases.
We need more data about not working Unicode comparisons in order to help you. However, it might be a matter of normalization. Consider the following:
>>> a1= u'\xeatre'
>>> a2= u'e\u0302tre'
a1
and a2
are equivalent but not equal:
>>> print a1, a2
être être
>>> print a1 == a2
False
So you might want to use the unicodedata.normalize()
method:
>>> import unicodedata as ud
>>> ud.normalize('NFC', a1)
u'\xeatre'
>>> ud.normalize('NFC', a2)
u'\xeatre'
>>> ud.normalize('NFC', a1) == ud.normalize('NFC', a2)
True
If you give us more information, we might be able to help you more, though.
这篇关于如何在 Python 中读取 Unicode 输入并比较 Unicode 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!