编码文件名的差异

编码文件名的差异

本文介绍了在PHP中,我该如何处理HFS +和其他地方的HFS +编码文件名的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个非常简单的文件搜索,其中搜索数据库是一个文本文件,每行一个文件名.该数据库是使用PHP构建的,并且通过对文件(也使用PHP)进行grep复制来找到匹配项.

I am creating a very simple file search, where the search database is a text file with one file name per line. The database is built with PHP, and matches are found by grepping the file (also with PHP).

这在Linux中效果很好,但是在Mac上,当使用非ascii字符时.看起来名称在HFS +(MacOSX)上的编码方式与在例如ext3(Linux).这是一个test.php:

This works great in Linux, but not on Mac when non-ascii characters are used. It looks like names are encoded differently on HFS+ (MacOSX) than on e.g. ext3 (Linux). Here's a test.php:

<?php
$mystring = "abcóüÚdefå";
file_put_contents($mystring, "");
$h = dir('.');
$h->read(); // "."
$h->read(); // ".."
$filename = $h->read();

print "string: $mystring and filename: $filename are ";

if ($mystring == $filename) print "equal\n";
else print "different\n";

运行MacOSX时:

$ php test.php
string: abcóüÚdefå and filename: abcóüÚdefå are different
$ php test.php |cat -evt
string: abcóü?M-^Zdefå$ and filename: abco?M-^Au?M-^HU?M-^Adefa?M-^J are different$

在Linux(或MacOSX上的nfs挂接的ext3文件系统)上运行时:

When run on Linux (or on a nfs-mounted ext3 filesystem on MacOSX):

$ php test.php
string: abcóüÚdefå and filename: abcóüÚdefå are equal
$ php test.php |cat -evt
string: abcM-CM-3M-CM-<M-CM-^ZdefM-CM-% and filename: abcM-CM-3M-CM-<M-CM-^ZdefM-CM-% are equal$

有没有办法使此脚本在两个平台上都返回等于"?

Is there a way to make this script return "equal" on both platforms?

推荐答案

MacOSX使用规范化形式D(NFD)来编码UTF-8,而大多数其他系统都使用NFC .

MacOSX uses normalization form D (NFD) to encode UTF-8, while most other systems use NFC.

(来自unicode.org )

几个 实现 NFD到NFC的转换.在这里,我使用了PHP Normalizer类来检测NFD字符串并将其转换NFC.它可以在PHP 5.3中使用,也可以通过 PECL国际化扩展获得.以下修改将使脚本起作用:

There are several implementations on NFD to NFC conversion. Here I've used the PHP Normalizer class to detect NFD strings and convert them to NFC. It's available in PHP 5.3 or through the PECL Internationalization extension. The following amendment will make the script work:

...
$filename = $h->read();
if (!normalizer_is_normalized($filename)) {
   $filename = normalizer_normalize($filename);
}
...

这篇关于在PHP中,我该如何处理HFS +和其他地方的HFS +编码文件名的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 20:06