本文介绍了php抓取并在给定标签中输出特定值或数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我对php很陌生.但是在一些帮助下,我已经弄清楚了如何刮擦一个站点,如果该站点具有标签标识符,例如h1 class = ____

so I'm very new to php. But with some help, I've figured out how to scrape a site if it has a tag identifier like h1 class=____

更好的是,我想出了如何输出所需的精确字词或值,只要用空格将它们分隔开即可.因此,例如,如果给定的标签名称<库存具有"30个球"的输出,我可以指定echo [0],仅输出30.哪个很棒.

And even better, I've figured out how to output the precise word or value I want, as long as it's separated by a blank white space. So for example if a given tag name < INVENTORY > has an output of "30 balls," I can specify to echo[0], and only 30 will output. Which is great.

但是,在尝试提取不由空格分隔的值时,我遇到了一个问题.所以我的意思是,假设我要"-34.89"作为输出(更确切地说,站点上该占位符上的数字是多少,因为源站点上的数字可能会随着时间而变化).

I'm running into an issue though, were I'm trying to extract a value that is not separated by a blank space. So what I mean is, let's say I want "-34.89," as the output (more precisely, whatever number is in that place holder on the site, since the numbers on the source site are likely changing over time).

但是,我得到的输出是"-34.89dowjonesstockchange".那里没有空格.

But, my output I'm getting is "-34.89dowjonesstockchange". There's no blank space there.

我要怎么做才能输出-34.89?或者,给定日期的位置可能有任何数字.在上面的输出中,必须有某种方式来表示,仅输出ex的值[0,1,2,3,4,5],就值的数量而言为-34.89.

What do I do to just output the -34.89? Or, whatever number may be in it's place on a given day. There must be some way to signify in that above output, to only output values [0,1,2,3,4,5] for ex, which would be -34.89 in terms of numbers of values.

以下是网站上的测试示例,该示例输出由""空格处.这几乎是我所需要的,但是缺少这种更加精确的方法.

The below is a test example on a website, that outputs words and values determined by " " blank space. Which is almost what I need, but missing this way of being even more precise.

// this function is a scraping function for ethereumchange
function getEthereumchange(){
    $doc = new DOMDocument;

    // We don't want to bother with white spaces
    $doc->preserveWhiteSpace = false;


    $doc->strictErrorChecking = false;
    $doc->recover = true;

    $doc->loadHTMLFile('https://coinmarketcap.com/');



    $xpath = new DOMXPath($doc);

    $query = "//tr[@id='id-ethereum']";




    $entries = $xpath->query($query);
    foreach ($entries as $entry) {
        $result = trim($entry->textContent);
        $ret_ = explode(' ', $result);
        //make sure every element in the array don't start or end with blank
        foreach ($ret_ as $key=>$val){
            $ret_[$key]=trim($val);
        }
        //delete the empty element and the element is blank "\n" "\r" "\t"
        //I modify this line
        $ret_ = array_values(array_filter($ret_,deleteBlankInArray));

        //echo the last element
        file_put_contents(globalVars::$_cache_dir . "ethereumchange",
$ret_[7]);

    }

非常感谢您.

推荐答案

如果要使用第三方库,可以使用 https://github.com/rajanrx/php-scrape

If you want to use third party library you can use https://github.com/rajanrx/php-scrape

<?php

use Scraper\Scrape\Crawler\Types\GeneralCrawler;
use Scraper\Scrape\Extractor\Types\MultipleRowExtractor;

require_once(__DIR__ . '/../vendor/autoload.php');
date_default_timezone_set('UTC');

// Create crawler
$crawler = new GeneralCrawler('https://coinmarketcap.com/');

// Setup configuration
$configuration = new \Scraper\Structure\Configuration();
$configuration->setTargetXPath('//table[@id="currencies"]');
$configuration->setRowXPath('.//tbody/tr');
$configuration->setFields(
    [
        new \Scraper\Structure\TextField(
            [
                'name'  => 'Name',
                'xpath' => './/td[2]/a',
            ]
        ),
        new \Scraper\Structure\TextField(
            [
                'name'  => 'Market Cap',
                'xpath' => './/td[3]',
            ]
        ),
        new \Scraper\Structure\RegexField(
            [
                'name'  => '% Change',
                'xpath' => './/td[7]',
                'regex' => '/(.*)%/'
            ]
        ),
    ]
);

// Extract  data
$extractor = new MultipleRowExtractor($crawler, $configuration);
$data = $extractor->extract();
print_r($data);

将打印以下内容:

Array
(
    [0] => Array
        (
            [Name] => Bitcoin
            [Market Cap] => $42,495,710,233
            [% Change] => -1.09
            [hash] => 76faae07da1d2f8c1209d86301d198b3
        )

    [1] => Array
        (
            [Name] => Ethereum
            [Market Cap] => $28,063,517,955
            [% Change] => -8.10
            [hash] => 18ade4435c69b5116acf0909e174b497
        )

    [2] => Array
        (
            [Name] => Ripple
            [Market Cap] => $11,483,663,781
            [% Change] => -2.73
            [hash] => 5bf61e4bb969c04d00944536e02d1e70
        )

    [3] => Array
        (
            [Name] => Litecoin
            [Market Cap] => $2,263,545,508
            [% Change] => -3.36
            [hash] => ea205770c30ddc9cbf267aa5c003933e
        )
   and so on ...

希望对您有所帮助:)

这篇关于php抓取并在给定标签中输出特定值或数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 07:53