问题描述
我在OSX以及Linux上的Python 2.5.1和2.6.5中使用Unicode归类对列表进行排序时遇到了问题.
I've got a problem with sorting lists using unicode collation in Python 2.5.1 and 2.6.5 on OSX, as well as on Linux.
import locale
locale.setlocale(locale.LC_ALL, 'pl_PL.UTF-8')
print [i for i in sorted([u'a', u'z', u'ą'], cmp=locale.strcoll)]
应打印:
[u'a', u'ą', u'z']
但是打印出来:
[u'a', u'z', u'ą']
总结一下-看起来strcoll坏了.尝试了各种类型的变量(例如非Unicode编码的字符串).
Summing it up - it looks as if strcoll was broken. Tried it with various types of variables (fe. non-unicode encoded strings).
我该怎么办?
最诚挚的问候,托马斯·科普楚克(Tomasz Kopczuk).
Best regards,Tomasz Kopczuk.
推荐答案
显然,在所有平台上进行排序的唯一方法是使用带有PyICU绑定的ICU库( PyPI上的PyICU ).
Apparently, the only way for sorting to work on all platforms is to use the ICU library with PyICU bindings (PyICU on PyPI).
在OS X:sudo port install py26-pyicu
上,请注意此处描述的错误: https://svn.macports. org/ticket/23429 (哦,使用macports的乐趣).
On OS X: sudo port install py26-pyicu
, minding bug described here: https://svn.macports.org/ticket/23429 (oh the joy of using macports).
不幸的是,严重缺乏PyICU的文档,但是我设法找到了它的完成方法:
PyICUs documentation is unfortunately severely lacking, but I managed to find out how it's done:
import PyICU
collator = PyICU.Collator.createInstance(PyICU.Locale('pl_PL.UTF-8'))
print [i for i in sorted([u'a', u'z', u'ą'], cmp=collator.compare)]
给出:
[u'a', u'ą', u'z']
另一个专业人士-@bobince:它是线程安全的,因此在设置请求方式的语言环境时并非没有用.
Another pro - @bobince: it's thread-safe, so not useless when setting request-wise locales.
这篇关于Python无法正确排序unicode. Strcoll没有帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!