问题描述
Puppeteer和PhantomJS相似.我俩都遇到了这个问题,代码也很相似.
Puppeteer and PhantomJS are similar. The issue I'm having is happening for both, and the code is also similar.
我想从网站上获取一些信息,该网站需要进行身份验证才能查看这些信息.我什至无法访问主页,因为它被检测为可疑活动",例如SS: https://i.imgur.com/p69OIjO.png
I'd like to catch some informations from a website, which needs authentication for viewing those informations. I can't even access home page because it's detected like a "suspicious activity", like the SS: https://i.imgur.com/p69OIjO.png
我发现,当我使用名为 Cookie 的标头在Postman上进行测试并且该cookie的值在浏览器中被捕获时,该问题不会发生,但是此cookie会在一段时间后过期.因此,我猜Puppeteer/PhantomJS都没有捕获cookie,因为该网站拒绝了无头的浏览器访问.
I discovered that the problem doesn't happen when I tested on Postman using a header named Cookie and the value of it's cookie caught on browser, but this cookie expires after some time. So I guess Puppeteer/PhantomJS both are not catching cookies, because this site is denying the headless browser access.
我该怎么做才能绕过这个?
What could I do for bypass this?
// Simple Javascript example
var page = require('webpage').create();
var url = 'https://www.expertflyer.com';
page.open(url, function (status) {
if( status === "success") {
page.render("home.png");
phantom.exit();
}
});
推荐答案
通常可以帮助您解决的问题:
Things that can help in general :
- 标题应类似于常见的浏览器,包括:
- 用户代理:使用最新版本(请参见 https://developers.whatismybrowser.com/useragents/explore/),或者更好,如果您发出多个请求,请使用随机最近的请求(请参见 https://github.com/skratchdot/random-useragent )
- 接受语言:类似" en,en-US; q = 0,5 "(适合您的语言)
- 接受:一个标准的标准是" text/html,application/xhtml + xml,application/xml; q = 0.9,/; q = 0.8 "
- Headers should be similar to common browsers, including :
- User-Agent : use a recent one (see https://developers.whatismybrowser.com/useragents/explore/), or better, use a random recent one if you make multiple requests (see https://github.com/skratchdot/random-useragent)
- Accept-Language : something like "en,en-US;q=0,5" (adapt for your language)
- Accept: a standard one would be like "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"
- 检查是否在客户端JavaScript页面上下文中设置了" navigator.plugins "和" navigator.language "
- Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context
这篇关于如何避免在Puppeteer和Phantomjs上被检测为bot?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!