问题描述
我正在尝试从网站上抓取一些数据.我对此比较陌生,所以我愿意接受任何建议.我查看了几个 stackoverflow 帖子,但找不到类似的问题/解决方案.
首先,我使用 DOM 查找页面中的所有 div(这里以 https://stackoverflow.com/ 为例).然后我可以轻松获取包含在class="或id="中的任何信息.但是,此页面使用了一些包含链接的额外的、非标准的标签.我想抓取此链接信息.例如:
理想情况下,我会从附加链接中获取所有信息.
到目前为止我的代码是行不通的:
find('div') 作为 $element)$element->find('附加链接');回声$元素;?>
提前致谢.
解决方案 如果我理解您的问题,您可以按照以下方法抓取 additional-link
的值.我展示了如何解析单个元素.鉴于您始终可以创建一个循环来获取所有内容.
find('[class="made-up-class"]',0);echo $item->getAttribute("附加链接");?>
I am trying to scrape some data from a website. I am relatively new to this so I am open to any suggestions. I have looked at several stackoverflow posts but can't find a similar problem/solution.
First, I use DOM to find all the div's in the page (here https://stackoverflow.com/ given as an example). Then I can easily get any information contained in 'class=' or 'id='. However, this page uses some additional, non-standard tags containing links. I would like to scrape this link information. For example:
<div class="made-up-class" additional-link="https://www.google.com/">
Ideally I would get all the information from the additional link.
My code so far is, which doesn't work:
<?php
require 'simple_html_dom.php';
$html = file_get_html('https://stackoverflow.com/');
foreach($html->find('div') as $element)
$element->find('additional-link');
echo $element;
?>
Thanks in advance.
解决方案 If I understood your question, you can scrape the value of additional-link
complying the following approach. I showed how you can parse a single element. Given that you can always create a loop to get them all.
<?php
require('simple_html_dom.php');
$html = "https://stackoverflow.com/";
$htmldoc = file_get_html($html);
$item = $htmldoc->find('[class="made-up-class"]',0);
echo $item->getAttribute("additional-link");
?>
这篇关于获取附加信息 <div>使用 PHP 网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
08-22 21:05