问题描述
在API文档上, http://docs.python. org/2/library/unicodedata.html#unicodedata.normalize .它说
On the API doc, http://docs.python.org/2/library/unicodedata.html#unicodedata.normalize. It says
文档含糊不清,有人可以用一些示例解释valid values
吗?
The documentation is rather vague, can someone explain the valid values
with some examples?
推荐答案
我发现文档非常清晰,但是这里有一些代码示例:
I find the documentation pretty clear, but here are a few code examples:
from unicodedata import normalize
print '%r' % normalize('NFD', u'\u00C7') # decompose: convert Ç to "C + ̧"
print '%r' % normalize('NFC', u'C\u0327') # compose: convert "C + ̧" to Ç
两个'D'(=分解)形式都将单个组合字符(如ä
)转换为两个字符(a
+两个点).两种'C'(= compose)形式都相反.
Both 'D' (=decompose) forms convert a single combined character (like ä
) into two characters (a
+ two dots). Both 'C' (=compose) forms do the reverse.
两个"K"形式用于转换添加到Unicode的字符,以实现兼容性.例如,为了支持不能在符号周围画圆的软件,有一组带圆圈的数字",例如①(统一编号2460).当我们对其应用规范分解(NFD)时,它无能为力:
The two "K" forms are used to convert characters added to Unicode for compatibility purposes. For example, to support software that cannot draw circles around symbols, there is a set of "circled numbers", like ① (unicode number 2460). When we apply the canonical decomposition (NFD) to it, it doesn't do anything:
print '%r' % normalize('NFD', u'\u2460') # u'\u2460'
但是,兼容性分解(NFKD)将返回相应的兼容"字符:
However, the compatibility decomposition (NFKD) will return the corresponding "compatible" character:
print '%r' % normalize('NFKD', u'\u2460') # 1
有关更多详细信息,请参见 http://en.wikipedia.org/wiki/Unicode_equivalence .
See http://en.wikipedia.org/wiki/Unicode_equivalence for more details.
这篇关于unicodedata.normalize(form,unistr)如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!