问题描述
我正在使用php中的DOMDocument解析html。
I'm parsing html with DOMDocument in php.
我发现我无法使用xpath查询来全部选择。但是,getElementsByTagName()方法可以正常工作。
I found I'm unable to select all using an xpath query. However the getElementsByTagName() method works fine.
这里是代码:
$xml = new DOMDocument();
$xml->load("file.html");
$xpath = new DOMXPath($xml);
$links = $xpath->query("//a");
$links2 = $xml->getElementsByTagName("a");
foreach($links as $link){
echo "<br>$k: ".$link->nodeValue; // this doesn't print the node value. $links is empty
}
foreach($links2 as $link){
echo "<br>$k: ".$link->nodeValue; // this prints OK the node value
}
我本以为xpath-> query( // a)与getElementsByTagname( a)相同,但是显然不一样。
I'd have thought xpath->query("//a") would be the same as getElementsByTagname("a") but apparently isn't.
有人可以告诉我为什么它们不是相同。或者如果是,使用xpath查询选择节点时我做错了什么?
Could anybody tell me why they aren't the same. Or if they are, what am I doing wrong to select the nodes using the xpath query?
谢谢
推荐答案
无法复制:
如果您想使用 load
或 loadXML
是有效的X(HT)ML。 HTML基于SGML。尝试使用 loadHTML
或 loadHTMLFile
。
If you want to use load
or loadXML
your markup has to be valid X(HT)ML. HTML is SGML based. Try with loadHTML
or loadHTMLFile
.
请注意,当您使用 loadHTML
或 loadHTMLFile
时,DOM将尝试修复任何无效的HTML,使其对DOM适用。例如,它将在所有部分HTML文档周围添加一个基本的HTML框架,这可能会对您的XPath查询产生影响(尽管在 \\a
情况下不会) )。
Note that when you use loadHTML
or loadHTMLFile
, DOM will try to repair any invalid HTML to an extent that it is workable for DOM. For instance, it will add a basic HTML skeleton around any partial HTML documents and that can have an effect on your XPath queries (not in the case of \\a
though).
这篇关于用DOMDocument解析html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!