本文介绍了错误:“输入不是正确的 UTF-8,请指示编码!";使用 PHP 的 simple

问题描述

我收到错误:

解析器错误:输入的 UTF-8 不正确,请指示编码!字节数:0xED 0x6E 0x2C 0x20

尝试使用来自 3rd 方来源的 simple 处理

When trying to process an simple from a 3rd party source. The raw

然而,Dublín 之类的词.

Yet it seems that the Dublín in the

我无法让第 3 方整理他们的

I'm unable to get the 3rd party to sort out their

如何预处理

How can I pre-process the

有没有办法检测

Is there a way to detect the correct encoding for a

推荐答案

您的 0xED 0x6E 0x2C 0x20 字节对应于 ISO-8859-1 中的ín",所以看起来您的内容是 ISO-8859-1,不是 UTF-8.告诉您的数据提供商并要求他们修复它,因为如果它对您不起作用,那么它可能也不适用于其他人.

Your 0xED 0x6E 0x2C 0x20 bytes correspond to "ín, " in ISO-8859-1, so it looks like your content is in ISO-8859-1, not UTF-8. Tell your data provider about it and ask them to fix it, because if it doesn't work for you it probably doesn't work for other people either.

现在有几种方法可以解决这个问题,只有在无法正常加载 使用.其中之一是使用 utf8_encode().缺点是,如果该 莫吉巴克.或者您可以尝试使用 iconv() 或 mbstring 将字符串从 UTF-8 转换为 UTF-8,并希望他们能为您修复它.(他们不会,但您至少可以忽略无效字符,以便您可以加载您的

Now there are a few ways to work it around, which you should only use if you cannot load the . One of them would be to use utf8_encode(). The downside is that if that iconv() or mbstring, and hope they'll fix it for you. (they won't, but you can at least ignore the invalid characters so you can load your

或者您可以走很长很长的路,自己验证/修复序列.这将需要一段时间,具体取决于您对 UTF-8 的熟悉程度.也许有图书馆可以做到这一点,尽管我不知道.

Or you can take the long, long road and validate/fix the sequences by yourself. That will take you a while depending on how familiar you are with UTF-8. Perhaps there are libraries out there that would do that, although I don't know any.

无论哪种方式,请通知您的数据提供者他们正在发送无效数据,以便他们进行修复.

Either way, notify your data provider that they're sending invalid data so that they can fix it.

这是部分修复.它肯定不会解决所有问题,但会解决其中的一些问题.希望足以让您度过难关,直到您的提供商修复他们的东西.

Here's a partial fix. It will definitely not fix everything, but will fix some of it. Hopefully enough for you to get by until your provider fix their stuff.

function fix_latin1_mangled_with_utf8_maybe_hopefully_most_of_the_time($str)
{
    return preg_replace_callback('#[\xA1-\xFF](?![\x80-\xBF]{2,})#', 'utf8_encode_callback', $str);
}

function utf8_encode_callback($m)
{
    return utf8_encode($m[0]);
}

这篇关于错误:“输入不是正确的 UTF-8,请指示编码!";使用 PHP 的 simple

10-15 12:55