问题描述
我正尝试在无限页面中抓取所有可用的链接,向下滚动并获取新的链接.但是,在一次又一次向下滚动之前,time.sleep()不允许在合理的时间内暂停驱动程序.
I'm trying to scrape all the links available in an infinite page, scrolling down and getting the new links available. However, time.sleep() does not allow to pause the driver for a reasonable time, before scrolling down again and again.
有什么方法可以调整您可以在底部找到的代码,以减少第一次迭代(当页面仍快速加载新内容时)的睡眠次数,并等待下一次迭代所需的时间(页面何时将缓慢加载新内容)?
Is there any way to adjust the code that you can find at the bottom to reduce the number of sleep during the first iterations (when the page still loads the new content fast) and wait for the necessary time for the next iterations (when the page will load the new content slowly)?
使用简单的
for i in range(1,20):
time.sleep(i)
在第一次迭代中不会让我节省时间,并且在多次迭代后也不会有效地调整time.sleep().
would not make me save time during the first iterations and would not adjust the time.sleep() efficiently after many iterations.
这是我在"":
from selenium import webdriver
scroll_pause_time = 5
scraped_links = []
driver = webdriver.Chrome(executable_path=driver_path)
driver.get(url)
links = driver.find_elements_by_xpath(links_filepath)
for link in links:
if link not in scraped_links:
scraped_links.append(link)
print(link)
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(scroll_pause_time)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
links = driver.find_elements_by_xpath(links_filepath)
for link in links:
if link not in scraped_links:
scraped_links.append(link)
print(link)
经过20-30次迭代后,代码中断,因为time.sleep()与网页的刷新速度相比太低了.
After 20-30 iterations the code breaks because time.sleep() is too low compared to the refreshing speed of the webpage.
推荐答案
如果您不想每次都猜测加载页面需要多长时间并设置一些随机的睡眠时间,则可以使用显式等待.示例:
If you do not want to guess each time how long does it take to load the page and set some random seconds to sleep, you can use Explicit Waits. Example:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
except common.exceptions.TimeoutException:
print('TimeoutException')
finally:
driver.quit()
# do what you want after necessary elements are loaded
当time.sleep()与网页刷新速度相比太低时,这将解决问题.
This will solve the problem when time.sleep() becomes too low compared to the refreshing speed of the webpage.
这篇关于Python Selenium-调整pause_time在无限页面中向下滚动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!