问题描述
我想从表格中的页面中抓取一些数据.所以我只关心表中的数据.之前我在使用 Mechanize,但我发现有时会丢失一些数据,尤其是在表格底部.谷歌搜索,我发现这可能是由于机械化没有处理 Jquery/Ajax.
所以我今天改用 Selenium.如何等待一个且只有一个表完全加载,然后使用 selenium 和 python 从该表中提取所有链接?如果我等待完整页面加载,则需要一些时间.我想确保只加载表中的数据.我当前的代码:
驱动程序 = webdriver.Firefox()对于范围(1, 2)中的页面:driver.get("http://somesite.com/page/"+str(page))table = driver.find_element_by_css_selector('div.datatable')links = table.find_elements_by_tag_name('a')对于链接中的链接:打印链接.文本使用 WebDriverWait
等待直到找到表:
from selenium.webdriver.common.by import By从 selenium.webdriver.support.wait 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 EC...等待 = WebDriverWait(驱动程序,10)table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'div.datatable'))
这将是一个显式等待.
或者,您可以让驱动程序 隐式等待:
隐式等待是告诉 WebDriver 轮询某个特定的 DOM尝试查找一个或多个元素(如果它们是)的时间不是立即可用.默认设置为 0.一旦设置,为 WebDriver 对象实例的生命周期设置隐式等待.
from selenium import webdriver驱动程序 = webdriver.Firefox()driver.implicitly_wait(10) # 尝试定位元素时最多等待 10 秒对于范围(1, 2)中的页面:driver.get("http://somesite.com/page/"+str(page))table = driver.find_element_by_css_selector('div.datatable')links = table.find_elements_by_tag_name('a')对于链接中的链接:打印链接.文本
I want to scrape some data from a page which is in a table. So I am only bothered about the data in the table. Earlier I was using Mechanize, but I found sometimes some of the data are missing, especially in the bottom of the table. Googling, I found out that it may be due to mechanize not handling Jquery/Ajax.
So I switched to Selenium today. How do I wait for one and only one table to load completely and then extract all links from that table using selenium and python? If I wait for complete page to load, it is taking some time. I want to ensure that only data in the table is loaded. My current code:
driver = webdriver.Firefox() for page in range(1, 2): driver.get("http://somesite.com/page/"+str(page)) table = driver.find_element_by_css_selector('div.datatable') links = table.find_elements_by_tag_name('a') for link in links: print link.text
Use WebDriverWait
to wait until the table is located:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
wait = WebDriverWait(driver, 10)
table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'div.datatable'))
This would be an explicit wait.
Alternatively, you can make the driver wait implicitly:
from selenium import webdriver
driver = webdriver.Firefox()
driver.implicitly_wait(10) # wait up to 10 seconds while trying to locate elements
for page in range(1, 2):
driver.get("http://somesite.com/page/"+str(page))
table = driver.find_element_by_css_selector('div.datatable')
links = table.find_elements_by_tag_name('a')
for link in links:
print link.text
这篇关于使用 selenium 和 python 等待表完全加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!