问题描述
我使用PhantomJS为给定的URL获取页面内容。
问题是在一些页面上,PhantomJS无法加载一些资源(js,css ...),而我得到的错误是:
我可以在其上重现此问题的网页是
我无法获得的资源是:
- 以及其他一些... >
我运行的命令是:
phantomjs --debug = true --cookies-file = cookies.txt --ignore-ssl-errors = true --ssl-protocol = tlsv1 fetchpage.js http: //www.lifehacker.com
即使我删除cookies-file,ignore-ssl
fetchpage.js脚本为:
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
if(system.args.length === 1){
console.log('Usage:fetchpage.js< some URL>');
phantom.exit(1);
}
var url = system.args [1];
$ b page.open(url,function(status){
console.log(STATUS:+ status);
if(status !'=='成功'){
console.log(
打开网址时出错\+ page.reason_url
+\:+ page.reason
+\:+ page
);
phantom.exit(1);
} else {
var content = page.content;
console .log(content);
phantom.exit(1);
}
});
如果我在Chrome中打开同一个页面,页面加载就没有问题了。同样,如果我复制那些phantomjs无法加载并粘贴到Chrome的资源URL,它们会加载得很好。
b$ b
我曾尝试过谷歌类似的问题,但我只找到了一些关于设置超时的建议,这对我不起作用。
我已经尝试过phantomjs v1.9.0,1.9.8和2.0.1开发的同样的事情。
更为有趣的是,有时phantomjs脚本设法获得全面的资源从所有资源中榨取,所以我怀疑缓存,但我无法强制服务器以避免缓存。我试图通过像这样的phantomjs发送自定义标题: ...
var page = webPage.create( );
page.customHeaders = {
Cache-Control:no-cache,
Pragma:no-cache
};
page.open(url,function(status){
...
但没有任何变化。
我的想法已经用完了。
对于在寻找资源解决方案时并未完全加载phantomjs的编程人员,他们遇到了这个页面,我有一个项目,脚本会暂停/挂在几个资源上,如果它执行的话,它是50/50或者不是。
有些挖掘工具,我发现以下页面:
在解决方案中设置超时因为资源对我来说很重要:
page.settings.resourceTimeout = 10000;
关于上述问题,我不确定这是否完全适用,但至少现在可以更容易地找到信息,并且可以将其视为一个解决一些。
I'm using PhantomJS to get page content for given URL.The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:
Web page on which I can reproduce this problem is www.lifehacker.comThe resources I can not get are:
- http://x.kinja-static.com/assets/stylesheets/tiger-4ee27d6612a71ee3c68440f8e9c0025c.css
- http://c.amazon-adsystem.com/aax2/amzn_ads.js
- and some others too...
The command I'm running is:
phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com
and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.
The fetchpage.js script is:
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
if (system.args.length === 1) {
console.log('Usage: fetchpage.js <some URL>');
phantom.exit(1);
}
var url = system.args[1];
page.open(url, function (status) {
console.log("STATUS: " + status);
if (status !== 'success') {
console.log(
"Error opening url \"" + page.reason_url
+ "\": " + page.reason
+ "\": " + page
);
phantom.exit(1);
} else {
var content = page.content;
console.log(content);
phantom.exit(1);
}
});
If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.
I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.
I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.
What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:
...
var page = webPage.create();
page.customHeaders = {
"Cache-Control":"no-cache",
"Pragma":"no-cache"
};
page.open(url, function (status) {
...
but nothing changed.
I am running out of ideas..
For coders who come across this page during their quest to find an solution for resources not completely loading on phantomjs. I had a project where the script would stall/hang on a few resources. It was 50/50 if it would execute or not.
Some digging and I found the following page:https://github.com/ariya/phantomjs/issues/10652
Where the solution to set an timeout for resources was working out for me:
page.settings.resourceTimeout = 10000;
In regards to the above question I am not sure if this is completely appropiate but at least the information is easier to find now and can be regarded part of an solution to some.
这篇关于无法使用PhantomJS加载页面资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!