问题描述
在我们的网站上,某些Mac用户在将文本从PDF文件复制粘贴到TextArea(由TinyMCE处理)时遇到麻烦.所有突出的字符都已损坏,例如对于é
来说是 e?
,对于î
来说是 i?
等等.我无法在Windows计算机上重现此问题.
In our website, some Mac users have troubles when they copy-paste text from PDF files into a TextArea (handled by TinyMCE). All accentuated char are corrupted, and became for example e?
for a é
, i?
for a î
, etc. I cannot reproduce this problem with a Windows computer.
当我将TextArea的内容写到文件上(在将其插入数据库之前)时,我发现初始的é
在视觉上不同于传统的é
(在Vim上,请参见下文).
When I wrote the content of the TextArea on a file (before inserting it in the database), I just discovered that the initial é
is visually different that a traditionnal é
(on Vim, see below).
确实:
// the corrupted é - first line of the screenshot
echo bin2hex($char); // display 65cc81
// traditionnal é
echo bin2hex('é'); // display c3a9
经过大量搜索后,我在这里:似乎Mac OS将Unicode强调字符作为两个字符的组合来复制:在我们的示例中,为 e + ́
.到目前为止,我没有找到任何解决方案可以用真正的解决方案替换损坏的é
,从而避免数据库中出现 e?
.
After searching a lot, here I am :It seems that Mac OS copies Unicode accentuated chars as a combination of two chars: in our example, e + ́
. So far, I didn't find any solution to replace corrupted é
with the real one, to avoid e?
in the database.
我有点绝望.
推荐答案
将表示标准化为一个的过程形式或其他形式被称为规范化.在PHP中,有一个 Normalizer
类,通过它发送所有输入是一个好主意:
The process of normalizing the representation to one form or the other is called, well, normalization. In PHP there's the Normalizer
class for that, sending all input through it is a good idea:
$input = Normalizer::normalize($input);
您可能希望规范化为C,然后是规范分解,然后是规范组合.
You likely want to normalize to form C, Canonical Decomposition followed by Canonical Composition.
如果该类在您的系统上不可用,则有一个 Patchwork UTF-8库.
Should that class not be available on your system, there's the Patchwork UTF-8 library.
这篇关于PHP:强调Unicode的字符和变音符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!