问题描述
我正在使用DOM来获取div标签的内容,但内部html部分未显示。
功能是:
i am using DOM to get content of div tag but inner html part is not shown.Function is:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile("$url");
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$divTag = $xpath->query('//div[@id="post"]');
foreach ($divTag as $val) {
echo $val->getAttribute('title') . ' - ' . $val->nodeValue . "<br />\n";
}
如果页面的来源是(仅适用于Div)
if source of page is (just for Div)
<div id="post">Some text <img src="..." /> <table>some codes</table></div>
然后函数只返回
"Some text "
但我也想得到所有的HTML元素,像这样:
but i want to get all HTML elements too, like that:
Some text <img src="..." /> <table>some codes</table>
有什么办法吗?感谢现在。
Is there any way to do it? Thanks right now.
推荐答案
如果您正在寻找DOMDocument版本的 innerHTML
在浏览器DOM中,最接近的是 saveXML
。
If you're looking for the DOMDocument version of innerHTML
in the browser DOM, the nearest is saveXML
.
echo $dom->saveXML(val).'<br />\n';
(请记住,如果您希望该文本真正显示为文本)。
(Remember to htmlspecialchars if you want that to actually appear as text.)
尽管如此,这给了你 outerHTML
。如果您真的需要 innerHTML
,则必须循环遍历每个元素的子节点并将其传递到 saveXML
,然后打破它们。
This gives you the outerHTML
though. If you really need the innerHTML
, you'd have to loop through each of the element's child nodes and pass them to saveXML
, then implode them.
它只是XML序列化:没有相应的HTML版本。 saveHTML
确实存在,但只能一次保存整个文档,可悲的是。如果重要的是您获得旧版HTML,那么您可以通过传递 LIBXML_NOEMPTYTAG
选项来确保令人讨厌的空标签,如< script src =...>< / script>
不要破坏浏览器。
And it's XML serialisation only: there is no corresponding HTML version. saveHTML
does exist but can only save the whole document at once, sadly. If it matters that you get legacy-HTML, you might be able to get away with it by passing in the LIBXML_NOEMPTYTAG
option to ensure that annoying empty tags like <script src="..."></script>
don't break the browser.
这篇关于使用DOM获取内容(包括子标签)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!