本文介绍了htmlunit:返回完全加载的页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Java的HtmlUnit库来以编程方式操纵网站.我找不到问题的有效解决方案:如何确定所有AJAX调用均已完成并返回完整加载的网页?这是我尝试过的:

I am using HtmlUnit library for Java to manipulate websites programmatically. I can't find the working solution to my problem: How to determine that all AJAX calls are finished and return a completely loaded webpage? Here's what I have tried:

首先,我创建WebClient实例并调用我的方法processWebPage(String url, WebClient webClient)

Firstly I create WebClient instance and make call to my method processWebPage(String url, WebClient webClient)

WebClient webClient = null;
    try {
        webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
        webClient.setThrowExceptionOnScriptError(false);
        webClient.setThrowExceptionOnFailingStatusCode(false);
        webClient.setJavaScriptEnabled(true);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    } catch (Exception e) {
        System.out.println("Error");
    }
    HtmlPage currentPage = processWebPage("http://www.example.com", webClient);

这是我的方法,应该返回一个完全加载的网页:

And here is my method which should return a completely loaded web page:

private static HtmlPage processWebPage(String url, WebClient webClient) {
    HtmlPage page = null;
    try {
        page = webClient.getPage(url);
    } catch (Exception e) {
        System.out.println("Get page error");
    }
    int z = webClient.waitForBackgroundJavaScript(1000);
    int counter = 1000;
    while (z > 0) {
        counter += 1000;
        z = webClient.waitForBackgroundJavaScript(counter);
        if (z == 0) {
            break;
        }
        synchronized (page) {
            System.out.println("wait");
            try {
                page.wait(500);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
    System.out.println(page.asXml());
    return page;
}

如果没有要加载的JavaScript,则该z变量应返回0.

That z variable should return 0 if there are no JavaScript left to load.

有什么想法吗?预先感谢.

Any thoughts? Thanks in advance.

编辑:我找到了部分解决问题的方法,但是在这种情况下,我应该知道响应页面的外观.例如,如果一个完全加载的页面包含文本"complete",那么我的解决方案将是:

I found a partially working solution to my problem, but in this case I should know how the response page looks. For example, if a completely loaded page contains text "complete", my solution would be:

HtmlPage page = null;
    int PAGE_RETRY = 10;
    try {
        page = webClient.getPage("http://www.example.com");
    } catch (Exception e) {
        e.printStackTrace();
    }
    for (int i = 0; !page.asXml().contains("complete") && i < PAGE_RETRY; i++) {
        try {
            Thread.sleep(1000 * (i + 1));
            page = webClient.getPage("http://www.example.com");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

但是,如果我不知道完整加载的页面是什么样子,那该怎么办?

But what would be the solution if I don't know how a completely loaded page looks like?

推荐答案

尝试一下:

HtmlPage page = null;
try {
    page = webClient.getPage(url);
} catch (Exception e) {
    System.out.println("Get page error");
}
JavaScriptJobManager manager = page.getEnclosingWindow().getJobManager();
while (manager.getJobCount() > 0) {
    Thread.sleep(1000);
}
System.out.println(page.asXml());
return page;

这篇关于htmlunit:返回完全加载的页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-15 19:25