DOM从标签中获取所有属性

DOM从标签中获取所有属性

本文介绍了简单的HTML DOM从标签中获取所有属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分为两部分问题,但也可能是一个答案。我试图从

$ pre> < div id =foo>中获取一条信息。
< div class =bar>< a data1 =xxxxdata2 =xxxxhref =http://foo.bar>内部文本< / a>
< div class =bar2>< a data3 =xxxxdata4 =xxxxhref =http://foo.bar>更多文本< / a>

这是我现在正在使用的。

  $ articles = array(); 
$ html = file_get_html('http://foo.bar');
foreach($ html-> find('div [class = bar] a')as $ a){
$ articles [] = array($ a-> href,$ a-> ;的innerText);
}

这样可以很好地从第一个div类中获取href和内部文本。我尝试向foreach添加$ a-> data1,但是没有起作用。



我如何抓住这些内部数据标签,同时抓住href和innertext。



还有一个很好的方法来获得两个类与一个语句?我假设我可以建立id的查找,并抓住所有的div信息。



谢谢

解决方案

要抓住所有这些属性,调查解析的元素,如下所示:

  foreach($ html-> find('div [class = bar] a' )as $ a){
var_dump($ a-> attr);
}

...看看这些属性是否存在。它们似乎不是有效的HTML,所以也许解析器丢弃它们。



如果存在,你可以这样读:

  foreach($ html-> find('div [class = bar] a')as $ a){
$ article = array $ a-> href,$ a-> innertext);
if(isset($ a-> attr ['data1'])){
$ article ['data1'] = $ a-> attr ['data1'];
}
if(isset($ a-> attr ['data2'])){
$ article ['data2'] = $ a-> attr ['data2'] ;
}
// ...
$ articles [] = $ article;
}

要获得这两个类,您可以使用多个选择器,用逗号分隔:

  foreach($ html-> find('div [class = bar] a,div [class = bar2] a' )as $ a){
...


Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an

<div id="foo">
<div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a>
<div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>

Here is what I'm using now.

$articles = array();
$html=file_get_html('http://foo.bar');
foreach($html->find('div[class=bar] a') as $a){
    $articles[] = array($a->href,$a->innertext);
}

This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.

How do I grab those inner data tags at the same time I grab the href and innertext.

Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.

Thanks

解决方案

To grab all those attributes, you should before investigate the parsed element, like this:

foreach($html->find('div[class=bar] a') as $a){
  var_dump($a->attr);
}

...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.

If they exist, you can read them like this:

foreach($html->find('div[class=bar] a') as $a){
  $article = array($a->href, $a->innertext);
  if (isset($a->attr['data1'])) {
    $article['data1'] = $a->attr['data1'];
  }
  if (isset($a->attr['data2'])) {
    $article['data2'] = $a->attr['data2'];
  }
  //...
  $articles[] = $article;
}

To get both classes you can use a multiple selector, separated by a comma:

foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...

这篇关于简单的HTML DOM从标签中获取所有属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 20:04