本文介绍了如何避免在Puppeteer和Phantomjs上被检测为bot?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Puppeteer和PhantomJS相似.我俩都遇到了这个问题,代码也很相似.

Puppeteer and PhantomJS are similar. The issue I'm having is happening for both, and the code is also similar.

我想从网站上获取一些信息,该网站需要进行身份验证才能查看这些信息.我什至无法访问主页,因为它被检测为可疑活动",例如SS: https://i.imgur.com/p69OIjO.png

I'd like to catch some informations from a website, which needs authentication for viewing those informations. I can't even access home page because it's detected like a "suspicious activity", like the SS: https://i.imgur.com/p69OIjO.png

我发现,当我使用名为 Cookie 的标头在Postman上进行测试并且该cookie的值在浏览器中被捕获时,该问题不会发生,但是此cookie会在一段时间后过期.因此,我猜Puppeteer/PhantomJS都没有捕获cookie,因为该网站拒绝了无头的浏览器访问.

I discovered that the problem doesn't happen when I tested on Postman using a header named Cookie and the value of it's cookie caught on browser, but this cookie expires after some time. So I guess Puppeteer/PhantomJS both are not catching cookies, because this site is denying the headless browser access.

我该怎么做才能绕过这个?

What could I do for bypass this?

// Simple Javascript example
var page = require('webpage').create();
var url = 'https://www.expertflyer.com';

page.open(url, function (status) {
    if( status === "success") {
        page.render("home.png");
        phantom.exit();
    }
});

推荐答案

通常可以帮助您解决的问题:

Things that can help in general :

  • 标题应类似于常见的浏览器,包括:
    • Headers should be similar to common browsers, including :
      • User-Agent : use a recent one (see https://developers.whatismybrowser.com/useragents/explore/), or better, use a random recent one if you make multiple requests (see https://github.com/skratchdot/random-useragent)
      • Accept-Language : something like "en,en-US;q=0,5" (adapt for your language)
      • Accept: a standard one would be like "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"
      • 检查是否在客户端JavaScript页面上下文中设置了" navigator.plugins "和" navigator.language "
      • Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context

      这篇关于如何避免在Puppeteer和Phantomjs上被检测为bot?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 01:12