问题描述
关于在PHP中将HTML实体和特殊字符转换为UTF8文本,存在很多问题和文档.还有PHP文档本身,例如 htmlspecialchars_decode()
以及 html_entity_decode()
.但是,我找不到任何能清楚描述如何将任何HTML字符和特殊实体转换为UTF-8文本的函数/解决方案.他们所有人都说类似如果您想做,然后做"之类的东西,等等.但是没有解决方案曾说过"拥有人类可以阅读的纯UTF-8文本,然后就做 >".
There are a lot of questions and documentation about converting HTML entities and special characters to UTF8 text in PHP. And also there is the PHP documentation itself, such as this htmlspecialchars_decode()
and this html_entity_decode()
. However, I could not find any function/solution that clearly describes how to convert any HTML characters and special entities to UTF-8 text. All of them state something like "if you want to do this, then do that", etc. But no solution ever states "to have pure UTF-8 text that could be read by humans, then do this".
我问的原因是,我真的没有测试用例.我正在读取数据库,它是多语言的.但是,唯一的保证是字符是HTML格式的,我需要将其转换为UTF-8,以便理解这些语言的人可以阅读.现在,我该怎么做?清除/解码输入内容以使其是纯文本的正确方法是什么?
The reason for me asking, is I really don't have a test case. I am reading off a database, and it is multilingual. However the only guarantee is that the characters are in HTML, and I need to convert those to UTF-8, in a way that can be read by humans who understand those languages. Now, how can I do that? What is the proper way to sanitize/decode the input so it is pure text?
谢谢.
这是一个更新,因为从评论中可以明显看出我没有正确地提出问题.我的数据库包含文本.我想将该文本(包含HTML实体和特殊字符)转换为UTF-8文本,可以在网页上显示给最终用户.数据库中的此文本以多种语言编写(例如法语,阿拉伯语,英语等).所有这些都可以包含特殊字符的HTML实体.那么,如何将所有这些转换为理解这些语言的人可以阅读的UTF-8文本呢?我喜欢删除那些特殊字符,并将其转换为人类可以读取的内容.
Here is an update, as it is clear from the comments I was not asking the question properly. My DB contains text. I would like to convert that text (which contains HTML entities and special characters), to UTF-8 text that I can display to the end user on the webpage. This text in the databae is written in multiple languages (such as French, Arabic, English ...etc.). All those can contains HTML entities for special characters. So how can I convert all that to UTF-8 text that can be read by humans who understand those languages? I like to remove those special characters and convert them to something that can be read by humans.
推荐答案
这对我来说适用于将实体解码为utf8:
This works for me for decoding entities to utf8:
html_entity_decode($str, ENT_QUOTES | ENT_HTML5, 'UTF-8');
-它的技巧"是第二个参数中的组合,并且包括第三个参数中的编码.也就是说,如果您只执行html_entity_decode($str);
,结果将不会是utf8.
--The "trick" to it is the combination in the second parameter, and including the encoding in the third parameter. That is, if you just did html_entity_decode($str);
the result would not be utf8.
这篇关于在PHP中将HTML实体和特殊字符转换为UTF8文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!