问题描述
我正在使用带有Node.JS的Google App Engine使用puppeteer
I am using puppeteer on Google App Engine with Node.JS
每当我在应用引擎上运行puppeteer时,都会遇到错误提示
whenever I run puppeteer on app engine, I encounter an error saying
由于浏览器已断开连接,导航失败!
Navigation failed because browser has disconnected!
这在本地环境中很好用,所以我想这是应用引擎存在的问题.
This works fine in local environment, so I am guessing it is a problem with app engine.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: ["--disable-setuid-sandbox", "--no-sandbox"],
});
这是我的应用程序引擎的app.yaml文件
This is my app engine's app.yaml file
runtime: nodejs12
env: standard
handlers:
- url: /.*
secure: always
script: auto
-编辑-
当我添加--disable-dev-shm-usage
参数时它可以工作,但是它总是超时.这是我的代码.
It works when I add --disable-dev-shm-usage
argument, but then it always timeouts. Here are my codes.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-zygote",
"--single-process",
],
});
const page = await browser.newPage();
try {
const url = "https://seekingalpha.com/market-news/1";
const pageOption = {
waitUntil: "networkidle2",
timeout: 20000,
};
await page.goto(url, pageOption);
} catch (e) {
console.log(e);
await page.close();
await browser.close();
return resolve("error at 1");
}
try {
const ulSelector = "#latest-news-list";
await page.waitForSelector(ulSelector, { timeout: 30000 });
} catch (e) {
// ALWAYS TIMEOUTS HERE!
console.log(e);
await page.close();
await browser.close();
return resolve("error at 2");
}
...
推荐答案
似乎问题出在应用引擎的内存容量上.
It seems the problem was app engine's memory capacity.
当内存不足以应付操纵p的爬网时,
When memory is not enough to deal with puppeteer crawling,
它会自动生成另一个实例.
It automatically generates another instance.
但是,新创建的实例具有其他木偶浏览器.
However, newly created instance has a different puppeteer browser.
因此,结果为Navigation failed because browser has disconnected
.
解决方案只是升级App Engine实例,以便它可以通过单个实例处理抓取作业.
The solution is simply upgrade the app engine instance so it can deal with the crawling job by a single instance.
默认实例是F1,它具有256M的内存,所以我升级到F4,它具有1GB的记忆,然后它不再显示错误消息.
default instance is F1, which has 256M of memory, so I upgraded to F4, which has 1GB of memery, then it doesn't show an error message anymore.
runtime: nodejs12
instance_class: F4
handlers:
- url: /.*
secure: always
script: auto
这篇关于操纵error错误:导航失败,因为浏览器已断开连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!