

我currentyl对如何进行排序,其中包含UTF-8连接在PHP codeD字符串数组毫无头绪。该阵列来自LDAP服务器通过一个数据库,以便排序(会有问题)是无解的。

I currentyl have no clue on how to sort an array which contains UTF-8 encoded strings in PHP. The array comes from a LDAP server so sorting via a database (would be no problem) is no solution.The following does not work on my windows development machine (although I'd think that this should be at least a possible solution):

$array=array('Birnen', 'Äpfel', 'Ungetüme', 'Apfel', 'Ungetiere', 'Österreich');
$oldLocal=setlocale(LC_COLLATE, "0");
var_dump(setlocale(LC_COLLATE, 'German_Germany.65001'));
usort($array, 'strcoll');
var_dump(setlocale(LC_COLLATE, $oldLocal));


string(20) "German_Germany.65001"
string(1) "C"
array(6) {
  string(6) "Birnen"
  string(9) "Ungetiere"
  string(6) "Äpfel"
  string(5) "Apfel"
  string(9) "Ungetüme"
  string(11) "Österreich"


This is complete nonsense. Using 1252 as the codepage for setlocale() gives another output but still a plainly wrong one:

string(19) "German_Germany.1252"
string(1) "C"
array(6) {
  string(11) "Österreich"
  string(6) "Äpfel"
  string(5) "Apfel"
  string(6) "Birnen"
  string(9) "Ungetüme"
  string(9) "Ungetiere"


Is there a way to sort an array with UTF-8 strings locale aware?

就指出,这似乎是PHP在Windows上的问题,与 de_DE.utf8 相同片段用作语言环境的Linux机器上工作。不过这个Windows的具体问题的解决方案将是很好...

Just noted that this seems to be PHP on Windows problem, as the same snippet with de_DE.utf8 used as locale works on a Linux machine. Nevertheless a solution for this Windows-specific problem would be nice...


总结这个问题,我创建了以下code片段,清楚地表明,这个问题是与strcoll()函数使用的Windows 65001 UTF-8 codePAGE时。

Eventually this problem cannot be solved in a simple way without using recoded strings (UTF-8 → Windows-1252 or ISO-8859-1) as suggested by ΤΖΩΤΖΙΟΥ due to an obvious PHP bug as discovered by Huppie.To summarize the problem, I created the following code snippet which clearly demonstrates that the problem is the strcoll() function when using the 65001 Windows-UTF-8-codepage.

function traceStrColl($a, $b) {
    $outValue=strcoll($a, $b);
    echo "$a $b $outValue\r\n";
    return $outValue;

$locale=(defined('PHP_OS') && stristr(PHP_OS, 'win')) ? 'German_Germany.65001' : 'de_DE.utf8';

for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
    $array[]=mb_substr($string, $i, 1, 'UTF-8');
$oldLocale=setlocale(LC_COLLATE, "0");
var_dump(setlocale(LC_COLLATE, $locale));
usort($array, 'traceStrColl');
setlocale(LC_COLLATE, $oldLocale);


string(20) "German_Germany.65001"
a B 2147483647
array(59) {
  string(1) "c"
  string(1) "B"
  string(1) "s"
  string(1) "C"
  string(1) "k"
  string(1) "D"
  string(2) "ä"
  string(1) "E"
  string(1) "g"


The same snippet works on a Linux machine without any problems producing the following output:

string(10) "de_DE.utf8"
a B -1
array(59) {
  string(1) "a"
  string(1) "A"
  string(2) "ä"
  string(2) "Ä"
  string(1) "b"
  string(1) "B"
  string(1) "c"
  string(1) "C"

当使用Windows的1252(ISO-8859-1)EN codeD字符串(当然MB_ *编码和语言环境,必须再改)的片段也适用。

The snippet also works when using Windows-1252 (ISO-8859-1) encoded strings (of course the mb_* encodings and the locale must be changed then).

我提交的一个bug报告:的 UTF-8字符串工作。如果你遇到同样的问题,你可以给你的反馈,错误报告页面上的PHP团队(其他两个,可能是相关的,错误已被列为的 - 我不认为这虫子的的; - )

I filed a bug report on bugs.php.net: Bug #46165 strcoll() does not work with UTF-8 strings on Windows. If you experience the same problem, you can give your feedback to the PHP team on the bug-report page (two other, probably related, bugs have been classified as bogus - I don't think that this bug is bogus ;-).



08-20 11:37