问题描述
iconv 函数有时会给我一个错误:
The iconv function sometimes gives me an error:
Notice:
iconv() [function.iconv]:
Detected an incomplete multibyte character in input string in [...]
有没有办法在将数据发送到 inconv() 之前检测 UTF-8 字符串中是否存在非法字符?
Is there a way to detect that there are illegal characters in a UTF-8 string before sending data to inconv()?
推荐答案
首先,请注意,无法检测文本是否属于特定的不需要的编码.您只能检查字符串在给定编码中是否有效.
First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.
您可以使用 中提供的 UTF-8 有效性检查preg_match
自 PHP 4.3.5 起.如果给出无效字符串,它将返回 0
(没有附加信息):
You can make use of the UTF-8 validity check that is available in preg_match
since PHP 4.3.5. It will return 0
(with no additional information) if an invalid string is given:
$isUTF8 = preg_match('//u', $string);
另一种可能是mb_check_encoding
:
$validUTF8 = mb_check_encoding($string, 'UTF-8');
您可以使用的另一个函数是 mb_detect_encoding
:
Another function you can use is mb_detect_encoding
:
$validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));
将 strict
参数设置为 true
很重要.
It's important to set the strict
parameter to true
.
此外,iconv
允许您即时更改/删除无效序列.(但是,如果 iconv
遇到这样的序列,它会生成一个通知;此行为无法更改.)
Additionally, iconv
allows you to change/drop invalid sequences on the fly. (However, if iconv
encounters such a sequence, it generates a notification; this behavior cannot be changed.)
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL;
echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;
您可以使用@
并检查返回字符串的长度:
You can use @
and check the length of the return string:
strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));
也请查看 iconv
手册页上的示例.
Check the examples on the iconv
manual page as well.
这篇关于如何在 PHP 中检测格式错误的 UTF-8 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!