本文介绍了我可以将中间件添加到默认的Guzzle 6 HandlerStack中,而不是创建一个新的堆栈吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在以相当标准的方式使用Spatie\Crawler搜寻器软件,例如:

I am using the Spatie\Crawler crawler software in a fairly standard way, like so:

$client = new Client([
    RequestOptions::COOKIES => true,
    RequestOptions::CONNECT_TIMEOUT => 10,
    RequestOptions::TIMEOUT => 10,
    RequestOptions::ALLOW_REDIRECTS => true,
]);
$crawler = new Crawler($client, 1);
$crawler->
    setCrawlProfile(new MyCrawlProfile($startUrl, $pathRegex))->
    setCrawlObserver(new MyCrawlObserver())->
    startCrawling($url);

为了简洁起见,我省略了MyCrawlObserver类的MyCrawlProfile的定义,但是无论如何,这是可行的.

I've omitted the definition of the classes MyCrawlProfile of MyCrawlObserver for brevity, but anyway, this works as it stands.

我想添加一些中间件以便在发出请求之前更改一些请求,因此我添加了此演示代码:

I want to add some middleware in order to change some requests before they are made, so I added this demo code:

$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
$stack->push(
    Middleware::mapRequest(function (RequestInterface $request) {
        echo "Middleware running\n";

        return $request;
    })
);
$client = new Client([
    RequestOptions::COOKIES => true,
    RequestOptions::CONNECT_TIMEOUT => 10,
    RequestOptions::TIMEOUT => 10,
    RequestOptions::ALLOW_REDIRECTS => true,
    'handler' => $stack,
]);
// ... rest of crawler code here ...

但是,它是第一个障碍-它会刮除实际上是Location重定向的站点(/)的根,然后停止.事实证明,尽管我没有故意删除RedirectMiddleware,但我现在还是错过了它.

However, it falls on the first hurdle - it scrapes the root of the site (/) which is actually a Location redirect, and then stops. It turns out that I am now missing the RedirectMiddleware despite not having removed it deliberately.

因此,我的问题也通过添加以下内容得以解决:

So, my problem is fixed by also adding this:

$stack->push(Middleware::redirect());

我现在想知道我在创建新的HandlerStack时意外删除了Guzzle中默认设置的其他内容.饼干?重试机制?其他的东西?我现在不需要这些东西,但是如果我的代码仅修改了现有堆栈,我会对系统的长期可靠性更有信心.

I wonder now what other things are set up by default in Guzzle that I have accidentally removed by creating a fresh HandlerStack. Cookies? Retry mechanisms? Other stuff? I don't need those things right now, but I'd be a bit more confident about my system's long-term reliability if my code merely modified the existing stack.

有没有办法做到这一点?据我所知,我正在按照手册进行操作 .

Is there a way to do that? As far as I can tell, I'm doing things as per the manual.

推荐答案

$stack = HandlerStack::create();

代替

$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());

这很重要,因为create()添加了其他中间件,尤其是对于重定向.

It's important, because create() adds additional middlewares, especially for redirects.

这篇关于我可以将中间件添加到默认的Guzzle 6 HandlerStack中,而不是创建一个新的堆栈吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 04:51