问题描述
Digikey更改了他们的网站,现在有一个JavaScript,称为通过邮寄onload.这杀死了我以前的简单Java HTML代码检索器.我正在尝试使用PhantomJS在保存HTML/文本之前允许执行javascript.
Digikey has changed their website and now has a javascript that is called onload via post. This killed my former simple java HTML code retriever. I am trying to use PhantomJS to allow the execution of the javascript before saving the HTML/text.
var page = new WebPage(),
t, address;
var fs = require('fs');
if (phantom.args.length === 0) {
console.log('Usage: save.js <some URL>');
phantom.exit();
} else {
address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
f = null;
var markup = page.content;
console.log(markup);
try {
f = fs.open('htmlcode.txt', "w");
f.write(markup);
f.close();
} catch (e) {
console.log(e);
}
}
phantom.exit();
});
}
此代码可用于大多数网页,但在以下方面失败:
This code works with most webpages but fails on:
http://search.digikey.com/scripts/dksearch/dksus.dll ?keywords = S7072-ND
这是我的测试用例.它无法打开URL,然后PhantomJS崩溃.使用win32静态版本1.3.
Which is my test case. It fails to open the URL and then PhantomJS crashes. Using win32 static build 1.3.
有什么提示吗?
基本上,我所追求的是与页面渲染和在保存文件之前修改文档的脚本竞争的wget.
Basically what I am after is wget that competes the page rendering and scripts that modify the document before saving the file.
推荐答案
一个快速的肮脏解决方案...却在phantomjs网站上发布了...是为了超时.我已经修改了您的代码,使其包含2秒的等待时间.这样可使页面加载2秒钟,然后再将内容转储到文件中.如果您需要精确的秒数或时间量会相差很大,则此解决方案可能对您不起作用.
a quick an dirty solution... and yet is posted on the phantomjs site... is to use a time out. I have modified your code to include a 2 second wait. this allows the page to load for 2 seconds before dumping the contents to a file. If you need the exact second or the amount of time will vary greatly this solution probably wont work for you.
var page = new WebPage(),
t, address;
var fs = require('fs');
if (phantom.args.length === 0) {
console.log('Usage: save.js <some URL>');
phantom.exit();
} else {
address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
window.setTimeout(function(){
f = null;
var markup = page.content;
console.log(markup);
try {
f = fs.open('htmlcode.txt', "w");
f.write(markup);
f.close();
} catch (e) {
console.log(e);
}
}
phantom.exit();
},2000);
});
}
这篇关于PhantomJS页面转储脚本问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!