在远程站点使用scrape方法

在远程站点使用scrape方法

本文介绍了PHP函数来抓取< DIV>内的所有链接。在远程站点使用scrape方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都有PHP功能可以抓取远程站点上特定DIV中的所有链接?所以使用可能是:

Anyone has a PHP function that can grab all links inside a specific DIV on a remote site? So usage might be:

$ links = grab_links($ url,$ divname);

$links = grab_links($url,$divname);

数组我可以使用。抓取链接我可以找出,但不知道如何使它只在一个特定的div。

And return an array I can use. Grabbing links I can figure out but not sure how to make it only do it within a specific div.

谢谢!
Scott

Thanks!Scott

推荐答案

我发现了一些似乎做我想要的东西。

I found something that seems to do what I wanted.

<?php

$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='news_moreTopStories']//a/@href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}

// for images

echo "<br><br>";
$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='promo_area']//img/@src" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}

?>

我也试过PHP DOM方法,看起来更快...

I also tried PHP DOM method and it seems faster...

$html = file_get_contents('http://www.bbc.com');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementById('news_moreTopStories')->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute.
    echo $link->getAttribute('href'), '<br>';
}

这篇关于PHP函数来抓取&lt; DIV&gt;内的所有链接。在远程站点使用scrape方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 18:35