我们可以得到当前的HTML页面吗

我们可以得到当前的HTML页面吗

本文介绍了分裂或硒:单击按钮后,我们可以得到当前的HTML页面吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图抓取网站。但是,我发现该页面会动态呈现。所以,当我点击更多按钮时,会显示一些新消息。但是,使用分裂来单击按钮不会让browser.html自动更改为当前的html内容。有没有办法让它获得最新的HTML源代码,使用分裂或硒?我的碎片代码如下:

I'm trying to crawl the website "http://everydayhealth.com". However, I found that the page will dynamically rendered. So, when I click the button "More", some new news will be shown. However, using splinter to click the button doesn't let "browser.html" automatically changes to the current html content. Is there a way to let it get newest html source, using either splinter or selenium? My code in splinter is as follows:

import requests
from bs4 import BeautifulSoup
from splinter import Browser

browser = Browser()
browser.visit('http://everydayhealth.com')
browser.click_link_by_text("More")

print(browser.html)






基于@ Louis的答案,我重写了以下程序:


Based on @Louis's answer, I rewrote the program as follows:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

driver = webdriver.Firefox()
driver.get("http://www.everydayhealth.com")
more_xpath = '//a[@class="btn-more"]'
more_btn = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(more_xpath))
more_btn.click()
more_news_xpath = '(//a[@href="http://www.everydayhealth.com/recipe-rehab/5-herbs-and-spices-to-intensify-flavor.aspx"])[2]'
WebDriverWait(driver, 5).until(lambda driver: driver.find_element_by_xpath(more_news_xpath))

print(driver.execute_script("return document.documentElement.outerHTML;"))
driver.quit()

但是,在输出文本中,我仍然无法找到更新页面中的文本。例如,当我搜索牛奶是你的朋友还是敌人?时,它仍然没有任何回报。问题是什么?

However, in the output text, I still couldn't find the text in the updated page. For example, when I search "Is Milk Your Friend or Foe?", it still returns nothing. What's the problem?

推荐答案

使用Selenium,假设 driver 是您的已初始化 WebDriver 对象,这会为您提供与进行调用时DOM的状态相对应的HTML:

With Selenium, assuming that driver is your initialized WebDriver object, this will give you the HTML that corresponds to the state of the DOM at the time you make the call:

driver.execute_script("return document.documentElement.outerHTML;")

返回值是一个字符串,因此您可以这样做:

The return value is a string so you could do:

print(driver.execute_script("return document.documentElement.outerHTML;"))

这篇关于分裂或硒:单击按钮后,我们可以得到当前的HTML页面吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-26 18:40