本文介绍了无法使用 puppeteer 获取页面的完全加载的 html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试获取
解决方案
电子表格在 iframe 中,所以需要先获取 iframe:
const puppeteer = require('puppeteer');(异步() => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto(http://www.electproject.org/2016g", {超时:11000,等待:networkidle0",});等待 page.setViewport({宽度:640,身高:880,deviceScaleFactor: 1,});const 电子表格Frame = page.frames().find(框架 =>frame.url().startsWith('https://docs.google.com/spreadsheets/'));让电子表格头 = 等待电子表格Frame.evaluate(() =>document.body.querySelector('#top-bar').innerText);控制台日志(电子表格头);//2016 年 11 月大选:投票率等待 browser.close();})();
I'm trying to get the full html for this page. It has a spreadsheet that loads slowly. I'm able to get the spreadsheet included when taking a screenshot of the page. However I can't get the html for the spreadsheet. document.body.outerHTML
excludes the html for the spreadsheet. It's as if puppeteer is still seeing the page before the spreadsheet loads.
How do I get the fully loaded HTML including the HTML for the spreadsheet?
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("http://www.electproject.org/2016g", {
timeout: 11000,
waitUntil: "networkidle0",
});
await page.setViewport({
width: 640,
height: 880,
deviceScaleFactor: 1,
});
await page.screenshot({ path: "buddy-screenshot.png", format: "A4" }); // this screenshot displays the spreadsheet
let html = await page.evaluate(() => document.body.outerHTML); // this returns the html excluding the spreadsheet
await browser.close();
})();
解决方案
The spreadsheet is in an iframe, so you need to get the iframe first:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("http://www.electproject.org/2016g", {
timeout: 11000,
waitUntil: "networkidle0",
});
await page.setViewport({
width: 640,
height: 880,
deviceScaleFactor: 1,
});
const spreadsheetFrame = page.frames().find(
frame => frame.url().startsWith('https://docs.google.com/spreadsheets/')
);
let spreadsheetHead = await spreadsheetFrame.evaluate(
() => document.body.querySelector('#top-bar').innerText
);
console.log(spreadsheetHead); // 2016 November General Election : Turnout Rates
await browser.close();
})();
这篇关于无法使用 puppeteer 获取页面的完全加载的 html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!