通过PHP解码数字html实体

通过PHP解码数字html实体

本文介绍了通过PHP解码数字html实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!



我试图转换这个字符:



)。 (我检查了页面的源代码,页面有正确的utf8字符集头/元标记)。



有人知道代码是什么?

  function entity_decode($ string,$ quote_style = ENT_COMPAT,$ charset =UTF-8){
$ string = html_entity_decode($ string,$ quote_style,$ charset);

$ string = preg_replace_callback('〜& #x([0-9a-fA-F] +);〜i',chr_utf8_callback,$ string);
$ string = preg_replace('〜&#([0-9] +);〜e','chr_utf8(\\1)',$ string);

//这是另一个方法,也不工作..
// $ string = preg_replace_callback(/(\&#[0-9] +;) /,entity_decode_callback,$ string);

return $ string;
}




function chr_utf8_callback($ matches){
return chr_utf8(hexdec($ matches [1]));
}

函数chr_utf8($ num){
if($ num if($ num< 2048)return chr(($ num>> 6)+ 192)。 chr(($ num& 63)+ 128);
if($ num< 65536)return chr(($ num>> 12)+ 224)。 chr((($ num> 6)& 63)+ 128)。 chr(($ num& 63)+ 128);
if($ num< 2097152)return chr(($ num>> 18)+ 240)。 chr((($ num>> 12)& 63)+ 128)。 chr((($ num> 6)& 63)+ 128)。 chr(($ num& 63)+ 128);
return'';
}

function entity_decode_callback($ m){
return mb_convert_encoding($ m [1],UTF-8,HTML-ENTITIES);
}

echo'='。 entity_decode('&#146;');


解决方案

html_entity_decode 已执行您要查找的操作:

  $ string ='&#146;'; 

echo html_entity_decode($ string,ENT_COMPAT,'UTF-8');

它将返回字符:

 'binary hex:c292 

这是。由于是私人使用,您的 PHP配置/版本/编译可能不会返回它。



也有一些怪癖: p>

请参阅:中被nokogiri转换为\\\’


I have this code to decode numeric html entities to the UTF8 equivalent character.

I'm trying to convert this character:

which should output:

However, it just disappears (no output). (i've checked the source code of the page, the page has the correct utf8 character set headers/meta tags).

Does anyone know what is wrong with the code?

function entity_decode($string, $quote_style = ENT_COMPAT, $charset = "UTF-8") {
     $string = html_entity_decode($string, $quote_style, $charset);

     $string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);
     $string = preg_replace('~&#([0-9]+);~e', 'chr_utf8("\\1")', $string);

    //this is another method, which also doesn't work..
     //$string = preg_replace_callback("/(\&#[0-9]+;)/", "entity_decode_callback", $string);

     return $string;
}




function chr_utf8_callback($matches) {
     return chr_utf8(hexdec($matches[1]));
}

function chr_utf8($num) {
     if ($num < 128) return chr($num);
     if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
     if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
     if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
     return '';
}

function entity_decode_callback($m) {
     return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");
}

 echo '=' . entity_decode('&#146;');
解决方案

html_entity_decode already does what you're looking for:

$string = '&#146;';

echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

It will return the character:

’   binary hex: c292

Which is PRIVATE USE TWO (U+0092). As it's private use, your PHP configuration/version/compile might not return it at all.

Also there are some more quirks:

See: &#146; is getting converted as "\u0092" by nokogiri in ruby on rails

这篇关于通过PHP解码数字html实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 17:27