本文介绍了规范化不保留代码点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
任何人都可以解释一下为什么从U + 2126(Ω)和U + 03A9(Ω)的NFD标准化导致相同的表示,不保留代码点?我会预期这种行为NFKD和NFKC(和字符与变音符号)只有。
result1 = unicodedata.normalize NFD,u\\\Ω)
result2 = unicodedata.normalize(NFD,u\\\Ω)
print(NFD:+ repr(result1))
print(NFD:+ repr(result2))
输出:
NFD:u'\\\Ω'
/ pre>
NFD:u'\\\Ω'
解决方案这些被称为单例分解,存在于像U + 2126在Unicode中与现有标准兼容。它们不是兼容性分解(如U + 1D6C0
Can anyone please explain me why the NFD normalization from U+2126 (Ω) and U+03A9 (Ω) results in the same representation and does not preserve the code point? I would have expected this behaviour for NFKD and NFKC (and for characters with diacritics) only.
result1 = unicodedata.normalize("NFD", u"\u2126") result2 = unicodedata.normalize("NFD", u"\u03A9") print("NFD: " + repr(result1)) print("NFD: " + repr(result2))
Output:
NFD: u'\u03a9' NFD: u'\u03a9'
解决方案These are known as "singleton decompositions", and exist for characters like U+2126 (Ω) that are present in Unicode for compatibility with existing standards. They are not "compatibility decompositions" (like U+1D6C0
这篇关于规范化不保留代码点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!