使用 selenium 和 python 等待表完全加载

本文介绍了使用 selenium 和 python 等待表完全加载的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从表格中的页面中抓取一些数据.所以我只关心表中的数据.之前我在使用 Mechanize，但我发现有时会丢失一些数据，尤其是在表格底部.谷歌搜索，我发现这可能是由于机械化没有处理 Jquery/Ajax.

所以我今天改用 Selenium.如何等待一个且只有一个表完全加载，然后使用 selenium 和 python 从该表中提取所有链接?如果我等待完整页面加载，则需要一些时间.我想确保只加载表中的数据.我当前的代码:

驱动程序 = webdriver.Firefox()对于范围(1, 2)中的页面:driver.get("http://somesite.com/page/"+str(page))table = driver.find_element_by_css_selector('div.datatable')links = table.find_elements_by_tag_name('a')对于链接中的链接:打印链接.文本

解决方案

使用 WebDriverWait 等待直到找到表:

from selenium.webdriver.common.by import By从 selenium.webdriver.support.wait 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 EC...等待 = WebDriverWait(驱动程序，10)table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'div.datatable'))

这将是一个显式等待.

或者，您可以让驱动程序 隐式等待:

隐式等待是告诉 WebDriver 轮询某个特定的 DOM尝试查找一个或多个元素(如果它们是)的时间不是立即可用.默认设置为 0.一旦设置，为 WebDriver 对象实例的生命周期设置隐式等待.

from selenium import webdriver驱动程序 = webdriver.Firefox()driver.implicitly_wait(10) # 尝试定位元素时最多等待 10 秒对于范围(1, 2)中的页面:driver.get("http://somesite.com/page/"+str(page))table = driver.find_element_by_css_selector('div.datatable')links = table.find_elements_by_tag_name('a')对于链接中的链接:打印链接.文本

I want to scrape some data from a page which is in a table. So I am only bothered about the data in the table. Earlier I was using Mechanize, but I found sometimes some of the data are missing, especially in the bottom of the table. Googling, I found out that it may be due to mechanize not handling Jquery/Ajax.

So I switched to Selenium today. How do I wait for one and only one table to load completely and then extract all links from that table using selenium and python? If I wait for complete page to load, it is taking some time. I want to ensure that only data in the table is loaded. My current code:

driver = webdriver.Firefox()
for page in range(1, 2):
    driver.get("http://somesite.com/page/"+str(page))
    table = driver.find_element_by_css_selector('div.datatable')
    links = table.find_elements_by_tag_name('a')
    for link in links:
        print link.text

解决方案

Use WebDriverWait to wait until the table is located:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...
wait = WebDriverWait(driver, 10)
table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'div.datatable'))

This would be an explicit wait.

Alternatively, you can make the driver wait implicitly:

from selenium import webdriver

driver = webdriver.Firefox()
driver.implicitly_wait(10) # wait up to 10 seconds while trying to locate elements
for page in range(1, 2):
    driver.get("http://somesite.com/page/"+str(page))
    table = driver.find_element_by_css_selector('div.datatable')
    links = table.find_elements_by_tag_name('a')
    for link in links:
        print link.text

这篇关于使用 selenium 和 python 等待表完全加载的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！