问题描述
所以在API文档中,。它表示
So on the API doc, http://docs.python.org/2/library/unicodedata.html#unicodedata.normalize. It says
模糊,有人可以用一些例子来解释有效值
The documentation is rather vague, can someone explain the valid values
with some examples?
推荐答案
我发现文档很清楚,但这里有一些代码示例:
I find the documentation pretty clear, but here are a few code examples:
from unicodedata import normalize
print '%r' % normalize('NFD', u'\u00C7') # decompose: convert Ç to "C + ̧"
print '%r' % normalize('NFC', u'C\u0327') # compose: convert "C + ̧" to Ç
'D' =分解)表单将单个组合字符(如ä
)转换为两个字符( a
+两个点)。 'C'(= compose)表单相反。
Both 'D' (=decompose) forms convert a single combined character (like ä
) into two characters (a
+ two dots). Both 'C' (=compose) forms do the reverse.
两个K表单用于转换添加到Unicode中的字符以实现兼容性。例如,为了支持不能在符号周围绘制圆圈的软件,有一组圆圈数字,如①(unicode号2460)。当我们应用规范分解(NFD)时,它不会做任何事情:
The two "K" forms are used to convert characters added to Unicode for compatibility purposes. For example, to support software that cannot draw circles around symbols, there is a set of "circled numbers", like ① (unicode number 2460). When we apply the canonical decomposition (NFD) to it, it doesn't do anything:
print '%r' % normalize('NFD', u'\u2460') # u'\u2460'
然而,兼容性分解(NFKD)将返回相应的兼容字符:
However, the compatibility decomposition (NFKD) will return the corresponding "compatible" character:
print '%r' % normalize('NFKD', u'\u2460') # 1
请参阅了解更多详情。
这篇关于somone可以解释unicodedata.normalize(form,unistr)如何使用例子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!