本文介绍了无法使用PhantomJS加载页面资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PhantomJS为给定的URL获取页面内容。
问题是在一些页面上,PhantomJS无法加载一些资源(js,css ...),而我得到的错误是:

我可以在其上重现此问题的网页是
我无法获得的资源是:




  • 以及其他一些...
  • >


我运行的命令是:

  phantomjs --debug = true --cookies-file = cookies.txt --ignore-ssl-errors = true --ssl-protocol = tlsv1 fetchpage.js http: //www.lifehacker.com 

即使我删除cookies-file,ignore-ssl



fetchpage.js脚本为:

  var webPage = require('webpage'); 
var system = require('system');
var page = webPage.create();

if(system.args.length === 1){
console.log('Usage:fetchpage.js< some URL>');
phantom.exit(1);
}

var url = system.args [1];
$ b page.open(url,function(status){

console.log(STATUS:+ status);

if(status !'=='成功'){
console.log(
打开网址时出错\+ page.reason_url
+\:+ page.reason
+\:+ page
);
phantom.exit(1);
} else {
var content = page.content;
console .log(content);
phantom.exit(1);
}
});

如果我在Chrome中打开同一个页面,页面加载就没有问题了。同样,如果我复制那些phantomjs无法加载并粘贴到Chrome的资源URL,它们会加载得很好。

b
$ b

我曾尝试过谷歌类似的问题,但我只找到了一些关于设置超时的建议,这对我不起作用。



我已经尝试过phantomjs v1.9.0,1.9.8和2.0.1开发的同样的事情。

更为有趣的是,有时phantomjs脚本设法获得全面的资源从所有资源中榨取,所以我怀疑缓存,但我无法强制服务器以避免缓存。我试图通过像这样的phantomjs发送自定义标题:

  ... 
var page = webPage.create( );
page.customHeaders = {
Cache-Control:no-cache,
Pragma:no-cache
};
page.open(url,function(status){
...

但没有任何变化。



我的想法已经用完了。

解决方案

对于在寻找资源解决方案时并未完全加载phantomjs的编程人员,他们遇到了这个页面,我有一个项目,脚本会暂停/挂在几个资源上,如果它执行的话,它是50/50或者不是。



有些挖掘工具,我发现以下页面:



在解决方案中设置超时因为资源对我来说很重要:

  page.settings.resourceTimeout = 10000; 

关于上述问题,我不确定这是否完全适用,但至少现在可以更容易地找到信息,并且可以将其视为一个解决一些。


I'm using PhantomJS to get page content for given URL.The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:

Web page on which I can reproduce this problem is www.lifehacker.comThe resources I can not get are:

The command I'm running is:

phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com

and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.

The fetchpage.js script is:

var webPage = require('webpage');
var system = require('system');
var page = webPage.create();

if (system.args.length === 1) {
  console.log('Usage: fetchpage.js <some URL>');
  phantom.exit(1);
}

var url = system.args[1];

page.open(url, function (status) {

  console.log("STATUS: " + status);

  if (status !== 'success') {
    console.log(
      "Error opening url \"" + page.reason_url
      + "\": " + page.reason
      + "\": " + page
    );
    phantom.exit(1);
  } else {
    var content = page.content;
    console.log(content);
    phantom.exit(1);
  }
});

If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.

I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.

I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.

What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:

...
var page = webPage.create();
page.customHeaders = {
    "Cache-Control":"no-cache",
    "Pragma":"no-cache"
};
page.open(url, function (status) {
  ...

but nothing changed.

I am running out of ideas..

解决方案

For coders who come across this page during their quest to find an solution for resources not completely loading on phantomjs. I had a project where the script would stall/hang on a few resources. It was 50/50 if it would execute or not.

Some digging and I found the following page:https://github.com/ariya/phantomjs/issues/10652

Where the solution to set an timeout for resources was working out for me:

page.settings.resourceTimeout = 10000;

In regards to the above question I am not sure if this is completely appropiate but at least the information is easier to find now and can be regarded part of an solution to some.

这篇关于无法使用PhantomJS加载页面资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 09:20
查看更多