为什么以下代码对于不同的多字节字符串表现不同?
echo preg_replace('@(?=\pL)@u', '*', 'م'); // prints: '*م' ✓
echo preg_replace('@(?=\pL)@u', '*', 'ض'); // prints: '*ض' ✓
echo preg_replace('@(?=\pL)@u', '*', 'غ'); // prints: '*�*�' ✗
echo preg_replace('@(?=\pL)@u', '*', 'ص'); // prints: '*�*�' ✗
参见:http://3v4l.org/fvab1
最佳答案
您还需要包含修饰字母( Lm
)。请参阅以下脚本遍历整个阿拉伯 unicode 块:
<?php
function uchar_2($dec)
{
$utf = chr(192 + (($dec - ($dec % 64)) / 64));
$utf .= chr(128 + ($dec % 64));
return $utf;
}
$issues = 0;
$count = 0;
for ($dec = 1536; $dec <= 1791; $dec++) {
$char = uchar_2($dec);
if (preg_replace('@^(?=\pLm)$@u', '*', $char) !== $char) {
printf("Issue with %s (%s)\n", $dec, $char);
$issues++;
}
$count++;
}
printf("Found %d issues in %d rows\n", $issues, $count);
没有
Lm
,这将失败大约一半的字符。关于php - 多字节字符串和环视奇怪的错误,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14941455/