问题描述
这是我的代码:
$oDom = new DOMDocument();
$oDom->loadHTML("èàéìòù");
echo $oDom->saveHTML();
这是输出:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>èà éìòù</p></body></html>
我想要这个输出:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>èàéìòù</p></body></html>
我尝试过...
$oDom = new DomDocument('4.0', 'UTF-8');
或包含1.0和其他内容,但什么也没有.
or with 1.0 and other stuffs but nothing.
另一件事...有没有办法获得相同的HTML?例如,使用此html输入<p>hello!</p>
来获取相同的输出<p>hello!</p>
,仅使用DOMDocument来解析DOM并在标记内进行一些替换.
Another thing ...There is a way to obtain the same untouched HTML?For example with this html in input <p>hello!</p>
obtain the same output <p>hello!</p>
using DOMDocument only for parsing the DOM and to do some substitutions inside the tags.
推荐答案
解决方案:
$oDom = new DOMDocument();
$oDom->encoding = 'utf-8';
$oDom->loadHTML( utf8_decode( $sString ) ); // important!
$sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important!
saveHTML()
方法以不同方式指定节点.您可以使用主节点($oDom->documentElement
)手动添加所需的!DOCTYPE
.另一个重要的事情是utf8_decode()
.在我的情况下,DOMDocument
类的所有属性和其他方法都无法产生预期的结果.
The saveHTML()
method works differently specifying a node.You can use the main node ($oDom->documentElement
) adding the desired !DOCTYPE
manually.Another important thing is utf8_decode()
.All the attributes and the other methods of the DOMDocument
class, in my case, don't produce the desired result.
这篇关于DomDocument和特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!