本文介绍了php查找表情符号[更新现有代码]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我的php代码中检测表情符号,并阻止用户输入它。

I'm trying to detect emoji in my php code, and prevent users entering it.

我的代码是:

if(preg_match('/\xEE[\x80-\xBF][\x80-\xBF]|\xEF[\x81-\x83][\x80-\xBF]/', $value) > 0)
{
    //warning...
}

但不适用于所有表情符号。有想法吗?

But doesn't work for all emoji. Any ideas?

推荐答案

if(preg_match('/\xEE[\x80-\xBF][\x80-\xBF]|\xEF[\x81-\x83][\x80-\xBF]/', $value)

您真的想在字符级别匹配Unicode,而不是试图跟踪UTF-8字节序列。使用 u 修饰符,以字符为基础来处理您的UTF-8字符串。

You really want to match Unicode at a character level, rather than trying to keep track of UTF-8 byte sequences. Use the u modifier to treat your UTF-8 string on a character basis.

表情符号编码在U + 1F300块中– U + 1F5FF。但是:

The emoji are encoded in the block U+1F300–U+1F5FF. However:


  • 日本运营商的表情符号集中的许多字符实际上已映射到现有的Unicode符号,例如卡片套装,十二生肖和一些箭头。您现在将这些符号算作表情符号了吗?

  • many characters from Japanese carriers' ‘emoji’ sets are actually mapped to existing Unicode symbols, eg the card suits, zodiac signs and some arrows. Do you count these symbols as ‘emoji’ now?

仍然有一些系统不使用新的符号-标准化的Unicode表情符号代码点,而不是在专用区域中使用临时范围。每个运营商都有自己的编码。iOS4使用了Softbank集。您可能希望封锁整个私人使用区域。

there are still systems which don't use the newly-standardised Unicode emoji code points, instead using ad-hoc ranges in the Private Use Area. Each carrier had their own encodings. iOS 4 used the Softbank set. More info. You may wish to block the entire Private Use Area.

例如:

function unichr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}

if (preg_match('/['.
    unichr(0x1F300).'-'.unichr(0x1F5FF).
    unichr(0xE000).'-'.unichr(0xF8FF).
']/u'), $value) {
    ...
}

这篇关于php查找表情符号[更新现有代码]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 05:56