javascript - 您如何在Puppeteer中迭代html标记以使用通配符获取innerText？

出于教育目的，我试图获得此页面https://www.tripadvisor.es/Restaurant_Review-g294308-d4754017-Reviews-or10-TAC_ROLL-Quito_Pichincha_Province.html的评论。我每页有10条评论，并且有这些html选择器集（我的代码用于从每页中获取所有10条评论，但页面已更新）：

#review_593124597 > div:nth-child(1) > div:nth-child(2) > div:nth-child(5) > div:nth-child(1) > p:nth-child(1)
#review_583146930 > div:nth-child(1) > div:nth-child(2) > div:nth-child(4) > div:nth-child(1) > p:nth-child(1)
#review_577877496 > div:nth-child(1) > div:nth-child(2) > div:nth-child(4) > div:nth-child(1) > p:nth-child(1)
#review_572957932 > div:nth-child(1) > div:nth-child(2) > div:nth-child(4) > div:nth-child(1) > p:nth-child(1)
#review_571417105 > div:nth-child(1) > div:nth-child(2) > div:nth-child(5) > div:nth-child(1) > p:nth-child(1)
#review_565883882 > div:nth-child(1) > div:nth-child(2) > div:nth-child(5) > div:nth-child(1) > p:nth-child(1)
#review_564612180 > div:nth-child(1) > div:nth-child(2) > div:nth-child(4) > div:nth-child(1) > p:nth-child(1)
#review_554301618 > div:nth-child(1) > div:nth-child(2) > div:nth-child(4) > div:nth-child(1) > p:nth-child(1)

更改的两件事是审阅ID和第4个div（介于第n个孩子4和5之间，我不知道它们是否也会影响innerText的结果）。我正在尝试获取这些元素的innerText，但是我没有运气。我当前使用的代码是：

const comentarios = 'div[id^=review_] > div:nth-child(1) > div:nth-child(2) > div:nth-child(5) > div:nth-child(1) > p:nth-child(1)'
const comnetarioLength = 'partial_entry';

let listLength = await page.evaluate((sel) => {
    window.scrollBy(0, window.innerHeight);
    return document.getElementsByClassName(sel).length;
}, comnetarioLength);

console.log(listLength);

以下是我曾经使用过的旧代码，但是页面已更新，但我不知道确切要做什么，因为我仅获得每个页面的第一个innerText：

for (let i = 1; i <= listLength; i++) {

    let selectorComentarios = comentarios.replace("Index", i); //<--I know
    //this is supposed to be different
    let comentario = await page.evaluate((sel) => { // Let's create variables and store values...

        try {
            let comentarioText = document.querySelector(sel).innerText;
            return comentarioText;
        }
        catch (e) { }

    }, selectorComentarios);
    console.log(comentario);
}

最佳答案

像这样吗该脚本输出带有前10条评论的数组。

'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    await page.goto('https://www.tripadvisor.es/Restaurant_Review-g294308-d4754017-Reviews-or10-TAC_ROLL-Quito_Pichincha_Province.html');

    const reviews = await page.evaluate(
      () => [...document.querySelectorAll('p.partial_entry')]
              .map( ({ innerText }) => innerText )
    )

    console.log(reviews);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

关于javascript - 您如何在Puppeteer中迭代html标记以使用通配符获取innerText？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/54495034/

InnerText

javascript - 您如何在Puppeteer中迭代html标记以使用通配符获取innerText？