问题描述
我需要在PHP中复制MySQL utf8_general_ci
归类的行为.严格来说,我需要检测哪些人应该被认为是不同的,哪些将被认为是相同的.大小写无关的部分很容易.问题是utf_general_ci
认为带有变音符号的字符和没有变音符号的字符是相等的:e =è=é等.要复制该比较,我需要有一种方法来替换è-> e,é-> e
I need to replicate the behavior of MySQL utf8_general_ci
collation in PHP. Strictly speaking I need to detect what whould be considered different and what would be considered the same. The case independent part is easy. The problem is utf_general_ci
considers characters with diacritics and characters without diacritics to be equal: e = è = é etc.. To replicate that comparison, I'd need to have a way to replace è -> e, é -> e.
我想到的方法是:
echo iconv("utf-8", "ascii//TRANSLIT", "é");
一个问题是iconv
的行为因当前语言环境而异,这就是问题所在.
One problem is iconv
behaves differently depending on current locale and that's asking for a problem.
另一个问题是输入内容可能还包含不应被剥夺或引起PHP通知的Cirillic字母.
The other problem is the input may also contain Cirillic letters that shouldn't be stripped or result in a PHP Notice.
echo iconv("utf-8", "ascii//TRANSLIT", "дом");
是否有解决方案,或者我必须手动创建每个带有变音符号的字符到没有变音符号的字符的映射?
Is there a solution or do I have to create manually mapping of each character with diacritic to a one without it?
推荐答案
intl的 Transliterator 可让您定义更深入的音译规则.可以在 icu-project.org 中找到有关音译规则的完整文档.
intl's Transliterator will let you define far more in-depth transliteration rules. The full documentation on transliteration rules can be found on icu-project.org.
$tests = [ "é", "дом" ];
$tl = Transliterator::create('Latin-ASCII;');
foreach($tests as $str) {
var_dump(
$tl->transliterate($str)
);
}
输出:
string(1) "e"
string(6) "дом"
这篇关于如何“删除变音符号"?从PHP中的UTF8字符开始?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!