问题描述
将PhantomJS和GhostDriver中的URL列入黑名单非常简单.首先使用处理程序初始化驱动程序:
Blacklisting URLs in PhantomJS and GhostDriver is pretty straightforward. First initialize the driver with a handler:
PhantomJSDriver driver = new PhantomJSDriver();
driver.executePhantomJS(loadFile("/phantomjs/handlers.js"))
并配置处理程序:
this.onResourceRequested = function (requestData, networkRequest) {
var allowedUrls = [
/https?:\/\/localhost.*/,
/https?:\/\/.*\.example.com\/?.*/
];
var disallowedUrls = [
/https?:\/\/nonono.com.*/
];
function isUrlAllowed(url) {
function matches(url) {
return function(re) {
return re.test(url);
};
}
return allowedUrls.some(matches(url)) && !disallowedUrls.some(matches(url));
}
if (!isUrlAllowed(requestData.url)) {
console.log("Aborting disallowed request (# " + requestData.id + ") to url: '" + requestData.url + "'");
networkRequest.abort();
}
};
我还没有找到使用HtmlUnitDriver的好方法.在>如何从HtmlUnit中的特定url过滤javascript中提到的ScriptPreProcessor ,但它使用的是WebClient,而不是HtmlUnitDriver.有什么想法吗?
I haven't found a good way to do this with HtmlUnitDriver. There's the ScriptPreProcessor mentioned in How to filter javascript from specific urls in HtmlUnit, but it uses WebClient, not HtmlUnitDriver. Any ideas?
推荐答案
扩展 HtmlUnitDriver 并实现ScriptPreProcessor
(用于编辑内容)和HttpWebConnection
(用于允许/阻止URL):
Extend HtmlUnitDriver and implement a ScriptPreProcessor
(for editing content) and a HttpWebConnection
(for allowing/blocking URLs):
public class FilteringHtmlUnitDriver extends HtmlUnitDriver {
private static final String[] ALLOWED_URLS = {
"https?://localhost.*",
"https?://.*\\.yes.yes/?.*",
};
private static final String[] DISALLOWED_URLS = {
"https?://spam.nono.*"
};
public FilteringHtmlUnitDriver(DesiredCapabilities capabilities) {
super(capabilities);
}
@Override
protected WebClient modifyWebClient(WebClient client) {
WebConnection connection = filteringWebConnection(client);
ScriptPreProcessor preProcessor = filteringPreProcessor();
client.setWebConnection(connection);
client.setScriptPreProcessor(preProcessor);
return client;
}
private ScriptPreProcessor filteringPreProcessor() {
return (htmlPage, sourceCode, sourceName, lineNumber, htmlElement) -> editContent(sourceCode);
}
private String editContent(String sourceCode) {
return sourceCode.replaceAll("foo", "bar"); }
private WebConnection filteringWebConnection(WebClient client) {
return new HttpWebConnection(client) {
@Override
public WebResponse getResponse(WebRequest request) throws IOException {
String url = request.getUrl().toString();
WebResponse emptyResponse = new WebResponse(
new WebResponseData("".getBytes(), SC_OK, "", new ArrayList<>()), request, 0);
for (String disallowed : DISALLOWED_URLS) {
if (url.matches(disallowed)) {
return emptyResponse;
}
}
for (String allowed : ALLOWED_URLS) {
if (url.matches(allowed)) {
return super.getResponse(request);
}
}
return emptyResponse;
}
};
}
}
这既可以编辑内容,也可以阻止URL.
This enables both editing of content, and blocking of URLs.
这篇关于HtmlUnitDriver中的黑名单和白名单URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!