本文介绍了Selenium 如何管理等待页面加载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发网络爬虫有一段时间了,对我来说最常见的问题是等待页面完全加载,包括请求、框架、脚本.我的意思是完全完成.

我使用了多种方法来修复它,但是当我使用多个线程来抓取网站时,我总是遇到此类问题.驱动程序自行打开,通过 URL,不等待并通过下一个 URL.

我的尝试是:

JavascriptExecutor js = (JavascriptExecutor) driver.getWebDriver();String result = js.executeScript("return document.readyState").toString();如果(!result.equals(完成")){线程睡眠(1000)}}

wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath));

当我运行单线程代码时,页面没有问题,但是当我使用多线程时,它变成了一场噩梦.网络无法像单线程那样处理网页,这就是为什么我需要在那段时间等待.我正在寻找一个确切的解决方案.有没有进度监听器之类的?

我在等你的建议.

类似问题:

Selenium -- 如何等到页面完全加载

解决方案

等待 document.readyState 成为 complete 并不是一个完整的证明方法来确保 元素的存在性可见性交互性.

因此,函数:

JavascriptExecutor js = (JavascriptExecutor) driver.getWebDriver();String result = js.executeScript("return document.readyState").toString();如果(!result.equals(完成")){线程睡眠(1000)}}

甚至等待jQuery.active == 0:

public void WaitForAjax2Complete() 抛出 InterruptedException{而(真){if ((Boolean) ((JavascriptExecutor)driver).executeScript("return jQuery.active == 0")){休息;}线程睡眠(100);}}

将是一个纯粹的开销.

您可以在以下位置找到一些相关讨论:

解决方案

有效的方法是诱导 WebDriverWait预期条件:

  • 元素的存在
  • 元素的可见性
  • 元素的交互性

您可以在以下位置找到一些相关讨论:

要爬取多个线程

WebDriver 不是线程-安全.话虽如此,如果您可以序列化对底层驱动程序实例的访问,则可以在多个线程中共享一个引用.这是不可取的.但是你总是可以为每个线程实例化一个 WebDriver 实例.

理想情况下,线程安全的问题不在于您的代码,而在于实际的浏览器绑定.他们都假设一次只有一个命令(例如,像真实用户一样).但另一方面,您始终可以为每个将启动多个浏览选项卡/窗口的线程实例化一个 WebDriver 实例.到目前为止,您的程序似乎是完美的.

现在,不同的线程可以在同一个Webdriver上运行,但是测试结果不会是你所期望的.背后的原因是,当您使用多线程在不同的选项卡/窗口上运行不同的测试时,需要一点线程安全编码,否则您将执行的操作如 click()>send_keys() 将转到当前具有 焦点 的打开的选项卡/窗口,而不管您希望运行的线程.这实质上意味着所有测试将在具有焦点在预期标签/窗口上的同一标签/窗口上同时运行.

I am developing web crawlers for a while and the most common issue for me is waiting for page to be completely loaded, includes requests, frames, scripts. I mean completely done.

I used several methods to fix it but when I use more than one thread to crawl websites I always get this kind of problem. the Driver opens itself, goes through the URL, doesn't wait and goes through the next URL.

My tries are:

JavascriptExecutor js = (JavascriptExecutor) driver.getWebDriver();
String result = js.executeScript("return document.readyState").toString();
    if (!result.equals("complete")) {
         Thread.sleep(1000)
    }
}

When I run a single-threaded code, I had no problem with pages but, When I use multi-threaded, It becomes a nightmare. Network cannot handle web pages like the single-threaded that is why I need waits in that while. I am looking for an exact solution. Is there any progress listener or something like that?

I am waiting for your advice.

Similar question:

Selenium -- How to wait until page is completely loaded

解决方案

To wait for document.readyState to be complete isn't a full proof approach to ensure presence, visibility or interactibility of an element.

Hence, the function:

JavascriptExecutor js = (JavascriptExecutor) driver.getWebDriver();
String result = js.executeScript("return document.readyState").toString();
    if (!result.equals("complete")) {
     Thread.sleep(1000)
    }
}

And even waiting for jQuery.active == 0:

public void WaitForAjax2Complete() throws InterruptedException
{
    while (true)
    {
        if ((Boolean) ((JavascriptExecutor)driver).executeScript("return jQuery.active == 0")){
            break;
    }
    Thread.sleep(100);
    }
}

Will be a pure overhead.

You can find a couple of relevant discussions in:


Solution

The effective approach will be to induce WebDriverWait inconjunction with the ExpectedConditions either for:

  • presence of element
  • visibility of element
  • interactibility of element

You can find a couple of relevant discussions in:


More than one thread to crawl

Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.

Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.

这篇关于Selenium 如何管理等待页面加载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 12:48