本文介绍了如何使用PHP Gouttee发送自定义标头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我在PHP cURL中有这个代码,可以解决阻塞问题。

  $ headers = array(
'Accept:text / html,application / xhtml + xml,application / xml; q = 0.9,image / webp ,* / *; q = 0.8',
'Accept-Encoding:zip,deflate,sdch'
,'Accept-Language:en-US,en; q = 0.8'
, 'Cache-Control:max-age = 0',
'User-Agent:'。$ user_agents [array_rand($ user_agents)]
);
curl_setopt($ curl_init,CURLOPT_URL,$ url);
curl_setopt($ curl_init,CURLOPT_HTTPHEADER,$ headers);
$ output = curl_exec($ curl_init);

效果不错。



我使用



希望这有帮助!!!


I am trying to scrape a site that actually block Bots.

I have this code in PHP cURL to get away with blockage.

$headers = array(
    'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding: zip, deflate, sdch'
    , 'Accept-Language:en-US,en;q=0.8'
    , 'Cache-Control:max-age=0',
    'User-Agent:' . $user_agents[array_rand($user_agents)]
);
curl_setopt($curl_init, CURLOPT_URL, $url);
curl_setopt($curl_init, CURLOPT_HTTPHEADER, $headers);
$output = curl_exec($curl_init);

It works well.

But I am using PHP Goutte, I want to generate same request using this library

$headers2 = array(
    'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding' => 'zip, deflate, sdch'
    , 'Accept-Language' => 'en-US,en;q=0.8'
    , 'Cache-Control' => 'max-age=0',
    'User-Agent' => $user_agents[array_rand($user_agents)]
);
$client = new Client();

foreach ($headers2 as $key => $v) {
    $client->setHeader($key, $v);
}
$resp = $client->request('GET', $url);
echo $resp->html();

But using this code I get blocked from the site I am scraping.

I want to know how can I use Gouttee to properly use Headers?

解决方案

Can you try to check result of Goutte

$status_code = $client->getResponse()->getStatus();
echo $status_code;

This is source code I had success with GuzzleIn index.php

<?php
    ini_set('display_errors', 1);
?>
<html>
<head><meta charset="utf-8" /></head>
<?php
    $begin = microtime(true);
    require 'vendor/autoload.php';
    require 'helpers/helper.php';
    $client = new GuzzleHttp\Client([
        'base_uri' => 'http://www.yellowpages.com.au',
        'cookies' => true,
        'headers' =>  [
            'Accept'          => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Encoding' => 'zip, deflate, sdch',
            'Accept-Language' => 'en-US,en;q=0.8',
            'Cache-Control'   => 'max-age=0',
            'User-Agent'      => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0'
        ]
    ]);
    $helper = new Helper($client);
    $mostViewed = $helper->getPageTest();
?>
<html>

In helper.php file

<?php
use GuzzleHttp\ClientInterface;
use Symfony\Component\DomCrawler\Crawler;
class Helper{
    protected $client;
    protected $totalPages;
    public function __construct(ClientInterface $client){
        $this->client       = $client;
        $this->totalPages   = 3;
    }
    public function query()
    {
        $queries = array(
            'clue'  => 'Builders',
            'locationClue'  => 'Sydney%2C+2000',
            'mappable' => 'true',
            'selectedViewMode' => 'list'
        );
        // print_r($queries);
        return $this->client->get('search/listings', array('query' => $queries));
    }
    public function getPageTest()
    {
        $responses = $this->query();
        $html = $responses->getBody()->getContents();
        echo $html;
        exit();
    }
}
?>

And result I got

Hope this helpful!!!

这篇关于如何使用PHP Gouttee发送自定义标头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 11:17