本文介绍了字符编码问题与PHP简单的HTML DOM解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我正在使用PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/ 来获取其他域的页面标题,元描述和元标记等数据,然后将其插入数据库。 但是我有一些编码问题。问题是我没有从那些不是英文的网站得到正确的字符。 以下是代码: <?php require'init.php'; $ curl = new curl(); $ html = new simple_html_dom(); $ page = $ _GET ['page']; $ curl_output = $ curl-> getPage($ page); $ html-> load($ curl_output ['content']); $ meta_title = $ html-> find('title',0) - > innertext; 打印$ meta_title。 < hr />; // print $ html-> plaintext。 < hr />; ?> facebook.com的输出 p> 欢迎来到Facebook - 登录,注册或了解更多信息 amazon.cn的输出页 亚é©éé€ Š-ç½'上è'物商城:è|ç½'è',å°±æ¥Z.cn! 输出 mail.ru 页 Mail.Ru:поч Ñ,а,поиÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÑÑÑÐÐÐÐÐÐи¸¸¸¸¸¸¸¸¸¸¸¸¸¸ $ c> 所以,字符没有被正确编码。 任何人都可以帮我解决这个问题,以便我可以在我的数据库中添加正确的数据。解决方案 @deceze和@Shakti感谢您的帮助。对于由deceze发布的文章链接( +1 ck /rel =nofollow noreferrer>处理Unicode前端到Web应用程序),它也值得阅读了解编码 阅读您的意见后,回答当然这两篇文章,我终于解决了我的问题。 我列出了我迄今为止解决此问题的步骤: 添加了标题('Content-Type:text / html; charlet = utf-8'); 在我的init.php文件的顶部, 将我的数据库表字段的CHARACTER SET更改为将这些值存储到UTF-8, 将MySQL连接字符集设置为UTF-8 mysql_set_charset('utf8',$ connection_link_id); 使用htmlentities()函数转换字符 $ meta_title = htmlentities(trim($ meta_title_raw),ENT_QUOTES,'UTF-8'); li> 现在的问题似乎已经解决了,但是我仍然需要做以下事情来解决这个问题在FULL。 从源代码获取编码的字符集 $ source_charset 。 如果字符串的编码不是相同的编码,则将该字符串的编码更改为UTF-8。为此,唯一可用的PHP函数是 iconv()。示例: iconv($ source_charset,UTF-8,$ meta_title_raw); 要获得 $ source_charset 我可能需要使用一些技巧或多重检查。像检查标题和元标记等。我发现一个很好的答案在检测编码 如果有任何改善或上述步骤有任何错误,请告知我。 I am using PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/ to fetch data like Page Title, Meta Description and Meta Tags from other domains and then insert it into database.But I have some issues with encoding. The problem is that I do not get correct characters from those website which is not in English Language.Below is the code:<?phprequire 'init.php';$curl = new curl();$html = new simple_html_dom();$page = $_GET['page'];$curl_output = $curl->getPage($page);$html->load($curl_output['content']);$meta_title = $html->find('title', 0)->innertext;print $meta_title . "<hr />";// print $html->plaintext . "<hr />";?>Output for facebook.compageWelcome to Facebook â€" Log in, sign up or learn moreOutput for amazon.cnpage亚马逊-网上è´ç‰©å•†åŸŽï¼šè¦ç½‘è´, å°±æ¥Z.cn!Output for mail.rupageMail.Ru: почта, поиÑк в интернете, новоÑти, игры, развлечениÑSo, the characters is not being encoded properly.Can anyone help me how to solve this issue so that I can add correct data into my database. 解决方案 @deceze and @Shakti thanks for your help.+1 for the article link posted by deceze (Handling Unicode Front to Back in a Web App) and it also worth reading Understanding encodingAfter reading your comments, answer and of course those two articles, I finally solved my issue.I have listed the steps I did so far to solve this issue:Added header('Content-Type: text/html; charset=utf-8'); on the top of my init.php file,Changed CHARACTER SET of my database table field which is storing those value to UTF-8,Set MySQL connection charset to UTF-8 mysql_set_charset('utf8', $connection_link_id);Used htmlentities() function to convert characters $meta_title = htmlentities(trim($meta_title_raw), ENT_QUOTES, 'UTF-8');Now the issue seems to be solved, BUT I still have to do following thing to solve this issue in FULL.Get the encoded charset from the source $source_charset.Change the encoding of the string into UTF-8 if it is already not in the same encoding. For this the only available PHP function is iconv(). Example: iconv($source_charset, "UTF-8", $meta_title_raw);For getting $source_charset I probably have to use some tricks or multi checking. Like checking headers and meta tag etc. I found a good answer at Detect encodingLet me know if there are any improvements or any fault on my steps above. 这篇关于字符编码问题与PHP简单的HTML DOM解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!