本文介绍了UTF-8 &PHP 中的 IsAlpha()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个应用程序,该应用程序支持多种语言,并具有尝试使用浏览器请求的语言并允许手动覆盖此功能的功能.这部分工作正常并选择正确的模板、标签等.

I'm working on a application which supports several languages and has a functionality in place which tries to use the language requested by the browser and also allows manual override of this function. This part works fine and picks the correct templates, labels, etc.

用户有时必须自己输入文本,这就是我遇到问题的地方,因为应用程序必须接受甚至复杂"的语言,如中文和俄语.到目前为止,我已经处理了其他帖子中提到的事情,即:

User have to enter sometimes text on their own and that's where I run into issues because the application has to accept even "complicated" languages like Chinese and Russian. So far I've taken care of the things mentioned in other posting, i.e.:

  • 调用mb_internal_encoding('UTF-8')
  • 在使用 meta http-equiv=Content-Type content=text/html;charset=UTF-8(由于 stackoverflow 限制而调整格式)的网页时设置正确的编码
  • 即使内容正确到达,因为mb_detect_encoding() == UTF-8
  • 尝试设置 setLocale(LC_CTYPE, "UTF-8"),这似乎不起作用,因为它需要选择一种语言,我无法指定,因为我必须支持几个.如果我出于测试目的手动强制执行它,它仍然会失败,即:setLocale(LC_CTYPE,"zh__CN.utf8") - ctype_alpha() 对于中文文本仍然会失败
  • calling mb_internal_encoding( 'UTF-8' )
  • setting the right encoding when rendering the webpages with meta http-equiv=Content-Type content=text/html;charset=UTF-8 (format adapted due to stackoverflow limitations)
  • even the content arrives correctly, because mb_detect_encoding() == UTF-8
  • tried to set setLocale(LC_CTYPE, "UTF-8"), which doesn't seem to work because it requires the selection of one language, which I can't specify because I have to support several. And it still fails if I force it manually for testing purposes, i.e. with; setLocale(LC_CTYPE,"zh__CN.utf8") - ctype_alpha() would still fail for Chinese text

似乎即使是明确的语言选择也不会使 ctype_alpha() 有用.

It seems that even explicit language selection doesn't make ctype_alpha() useful.

因此问题是:我应该如何检查所有语言中的字母字符?

我目前唯一的想法是手动检查有效"字符数组 - 但这对于中文来说似乎很难看.

The only idea I had at the moment is to check manually with arrays of "valid" characters - but this seems ugly especially for Chinese.

你会如何解决这个问题?

How would you solve this issue?

推荐答案

如果您只想检查有效的 unicode 字母而不管使用的语言,我建议使用正则表达式(如果您的 pcre-regex 扩展使用 unicode 支持构建):

If you'd like to check only for valid unicode letters regardless of the used language I'd propose to use a regular expression (if your pcre-regex extension is built with unicode support):

// adjust pattern to your needs
// $input needs to be UTF-8 encoded
if (preg_match('/^\p{L}+$/u', $input)) {
    // OK
} else {
    // not OK
}

\p{L} 使用 L(etter) 属性检查 unicode 字符,该属性包括属性 Ll(小写字母)、Lm(修饰字母)、Lo(其他字母)、Lt(标题大小写字母)和Lu(大写字母) - 来自:正则表达式详细信息).

\p{L} checks for unicode characters with the L(etter) property which includes the properties Ll (lower case letter), Lm (modifier letter), Lo (other letter), Lt (title case letter) and Lu (upper case letter) - from: Regular Expression Details).

这篇关于UTF-8 &PHP 中的 IsAlpha()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 11:36
查看更多