问题描述
我有一个包含几个超链接的页面.我要获取的格式为:
I have a page that contains several hyperlinks. The ones I want to get are of the format:
<html>
<body>
<div id="diva">
<a href="/123" >text2</a>
</div>
<div id="divb">
<a href="/345" >text1</a>
<a href="/678" >text2</a>
</div>
</body>
</html>
我要提取三个href 123,345和678.
I want to extract the three hrefs 123,345,and 678.
我知道如何使用$gm = $xpath->query("//a")
获取所有超链接,然后遍历它们以获取href属性.
I know how to get all the hyperlinks using $gm = $xpath->query("//a")
and then loop through them to get the href attribute.
是否存在某种正则表达式来仅获取具有上述格式(即"/digits")的属性?
Is there some sort of regexp to get the attributes with the above format only (.i.e "/digits")?
谢谢
推荐答案
XPath 1.0是DOMXPath()
支持的版本,没有正则表达式功能.不过,您可以轻松编写自己的PHP函数来执行要从DOMXPath
调用的Regex表达式,如.
XPath 1.0, which is the version supported by DOMXPath()
, has no Regex functionalities. Though, you can easily write your own PHP function to execute Regex expression to be called from DOMXPath
if you need one, as mentioned in this other answer.
可以通过XPath 1.0来测试属性值是否为数字,您可以在/
字符后的href
属性值上使用它,以测试属性值是否遵循模式/digits
:
There is XPath 1.0 way to test if an attribute value is a number, which you can use on href
attribute value after /
character, to test if the attribute value follows the pattern /digits
:
//a[number(substring-after(@href,'/')) = substring-after(@href,'/')]
更新:
为了完整起见,这里是一个有效的示例,该示例从preg_match >完成同一任务:
For the sake of completeness, here is a working example of calling PHP function preg_match
from DOMXPath::query()
to accomplish the same task :
$raw_data = <<<XML
<html>
<body>
<div id="diva">
<a href="/123" >text2</a>
</div>
<div id="divb">
<a href="/345" >text1</a>
<a href="/678" >text2</a>
</div>
</body>
</html>
XML;
$doc = new DOMDocument;
$doc->loadXML($raw_data);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("preg_match");
// php:function's parameters below are :
// parameter 1: PHP function name
// parameter 2: PHP function's 1st parameter, the pattern
// parameter 3: PHP function's 2nd parameter, the string
$gm = $xpath->query("//a[php:function('preg_match', '~^/\d+$~', string(@href))]");
foreach ($gm as $a) {
echo $a->getAttribute("href") . "\n";
}
这篇关于使用PHP&获取与正则表达式匹配的hrefs XPath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!