本文介绍了preg_match最简单的方法来匹配html标记内的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
例如,我有一个html代码,如:
for example i have a html code like :
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="rowData">
<tr align="center" class="fnt-vrdana-mavi" >
<td style="font-size:11px" colspan=3><b>Text text text</b>:3</td>
</tr>
<tr class="header" align="center">
<td height="18" colspan="3">Text text text</td>
</tr>
<tr align="center" class="fnt-vrdana" bgcolor="#eff3f4" height="18">
<td width="32%" height="17"><b>1</b></td>
<td width="34%"><b>0</b></td>
<td width="34%"><b>2</b></td>
</tr>
<tr align="center" class="fnt-vrdana-mavi">
<td height="17">2.90</td>
<td>3.20</td>
<td>1.85</td>
</tr>
</table>
哪种最佳正则表达式匹配< td>
标记内部的所有数据?
Which is best regular expression to match all data from inside <td>
tags?
推荐答案
我通常建议您是否需要实际表达要在HTML文档中寻找的内容,以便为此使用 xpath
表达式因为它可以为您提供实际值,而正则表达式则无法进一步解析HTML/XML,并且 xpath
表达式的粒度要精确得多.请参阅返回文本值的输出,例如,其中不带任何其他标签:
I normally suggest if you need to actually express what you're looking for in a HTML document to use an xpath
expression for that because it can give you the actual value whereas regex'es are not able to further parse the HTML/XML, and xpath
expressions are much more fine-grained. See the output which returns the text-value for example w/o any further tags inside:
array(8) {
[0]=>
string(16) "Text text text:3"
[1]=>
string(14) "Text text text"
[2]=>
string(1) "1"
[3]=>
string(1) "0"
[4]=>
string(1) "2"
[5]=>
string(4) "2.90"
[6]=>
string(4) "3.20"
[7]=>
string(4) "1.85"
}
代码:
$html = <<<EOD
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="rowData">
<tr align="center" class="fnt-vrdana-mavi" >
<td style="font-size:11px" colspan=3><b>Text text text</b>:3</td>
</tr>
<tr class="header" align="center">
<td height="18" colspan="3">Text text text</td>
</tr>
<tr align="center" class="fnt-vrdana" bgcolor="#eff3f4" height="18">
<td width="32%" height="17"><b>1</b></td>
<td width="34%"><b>0</b></td>
<td width="34%"><b>2</b></td>
</tr>
<tr align="center" class="fnt-vrdana-mavi">
<td height="17">2.90</td>
<td>3.20</td>
<td>1.85</td>
</tr>
</table>
EOD;
// create DomDocument to operate xpath on
$doc = new DomDocument;
$doc->loadHTML($html);
// create DomXPath
$xpath = new DomXPath($doc);
// perform the XPath query
$nodes = $xpath->query('//td');
// process nodes to return their actual value
$values = array();
foreach($nodes as $node) {
$values[] = $node->nodeValue;
}
var_dump($values);
这篇关于preg_match最简单的方法来匹配html标记内的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!