规范化不保留代码点 | 不保留代码点

本文介绍了规范化不保留代码点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都可以解释一下为什么从U + 2126（Ω）和U + 03A9（Ω）的NFD标准化导致相同的表示，不保留代码点？我会预期这种行为NFKD和NFKC（和字符与变音符号）只有。

  result1 = unicodedata.normalize NFD，u\\\Ω）
 result2 = unicodedata.normalize（NFD，u\\\Ω）
 print（NFD：+ repr（result1））
 print（NFD：+ repr（result2））

输出：

NFD：u'\\\Ω' NFD：u'\\\Ω' / pre>

解决方案

这些被称为单例分解，存在于像U + 2126在Unicode中与现有标准兼容。它们不是兼容性分解（如U + 1D6C0

Can anyone please explain me why the NFD normalization from U+2126 (Ω) and U+03A9 (Ω) results in the same representation and does not preserve the code point? I would have expected this behaviour for NFKD and NFKC (and for characters with diacritics) only.

result1 = unicodedata.normalize("NFD", u"\u2126")
result2 = unicodedata.normalize("NFD", u"\u03A9")
print("NFD: " + repr(result1))
print("NFD: " + repr(result2))

Output:

NFD: u'\u03a9'
NFD: u'\u03a9'

解决方案

These are known as "singleton decompositions", and exist for characters like U+2126 (Ω) that are present in Unicode for compatibility with existing standards. They are not "compatibility decompositions" (like U+1D6C0

这篇关于规范化不保留代码点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！