本文介绍了如何使用PhantomJS下载csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用普通浏览器(Chrome)浏览网站A时,当我点击网站A上的链接时,Chrome imediatelly以CSV文件的形式下载报告。

When I'm browsing a website A using normal browser (Chrome) and when I click on a link on the website A, Chrome imediatelly downloads report in a form of CSV file.

当我检查一个服务器响应标头时,我得到以下结果:

When I checked a server response headers I get the following results:

Cache-Control:private,max-age=31536000
Connection:Keep-Alive
Content-Disposition:attachment; filename="report.csv"
Content-Encoding:gzip
Content-Language:de-DE
Content-Type:text/csv; charset=UTF-8
Date:Wed, 22 Jul 2015 12:44:30 GMT
Expires:Thu, 21 Jul 2016 12:44:30 GMT
Keep-Alive:timeout=15, max=75
Pragma:cache
Server:Apache
Transfer-Encoding:chunked
Vary:Accept-Encoding

现在,我想使用PhantomJS下载并解析此文件。我设置页面 onResourceReceived 监听器,看看Phantom是否会收到/下载文件。

Now, I want to download and parse this file using PhantomJS. I set page onResourceReceived listener to see if Phantom will receive/download the file.

clientRequests.phantomPage.onResourceReceived = function(response) {
    console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};

当我让Phantom要求下载文件(这是page.open('URL of the FILE ')),我可以在Phantom日志中看到该文件被下载。以下是日志:

When I make Phantom request to download a file (this is page.open('URL OF THE FILE')), I can see in Phantom log that file is downloaded. Here are logs:

"contentType": "text/csv; charset=UTF-8",
    "headers": {
        "name": "Date",
        "value": "Wed, 22 Jul 2015 12:57:41 GMT"
    },
    "name": "Content-Disposition",
    "value": "attachment; filename=\"report.csv\"",
    "status":200,"statusText":"OK"

我收到的文件是内容,但是如何访问文件数据?当我打印当前的PhantomJS 页面对象时,我得到页面A的HTML,我不想要,我想要CSV文件,我需要使用JavaScript解析。

I received the file and it's content, but how to access file data? When I print current PhantomJS page object, I get the HTML of the page A and I don't want that, I want CSV file, which I need to parse using JavaScript.

推荐答案

经过几天和几天的调查,我不得不说有一些解决方案:

After days and days of investigation, I have to say that there are some solutions:


  • 在评估函数中,您可以使AJAX调用下载和编码文件,然后可以将此内容返回到幻像脚本

  • 您可以在某些GitHub页面上使用一些自定义Phantom库。

如果您需要使用PhanotmJS下载文件,那么远离PhantomJS并使用CasperJS 。 CasperJS基于PhantomJS,但它具有更好和直观的语法和程序流程。

If you need to download a file using PhanotmJS, then run away from PhantomJS and use CasperJS. CasperJS is based on PhantomJS, but it has much better and intuitive syntax and program flow.

这是一个很好的帖子解释。在这篇文章中,您可以找到有关文件下载的部分。

Here is good post explaining "Why CasperJS is better than PhantomJS". In this post you can find section about file download.

如何使用CasperJS下载CSV文件(即使服务器发送头文件 Content-Disposition :附件;文件名='file.csv

How to download CSV file using CasperJS (this works even when server sends header Content-Disposition:attachment; filename='file.csv)

这里可以找到一些可供下载的自定义csv文件:

Here you can find some custom csv file available for download: http://captaincoffee.com.au/dump/items.csv

为了使用CasperJS下载此文件,请执行以下代码:

In order to download this file using CasperJS execute the following code:

var casper = require('casper').create();

casper.start("http://captaincoffee.com.au/dump/", function() {
    this.echo(this.getTitle())
});
casper.then(function(){
    var url = 'http://captaincoffee.com.au/dump/csv.csv';
    require('utils').dump(this.base64encode(url, 'get'));
});

casper.run();

上面的代码将下载 http://captaincoffee.com.au/ dump / csv.csv CSV文件,并将打印结果作为base64字符串。所以这样,你甚至不必将数据下载到文件,你的数据就是base64字符串。

The code above will download http://captaincoffee.com.au/dump/csv.csv CSV file and will print results as base64 string. So this way, you don't even have to download data to file, you have your data as base64 string.

如果您明确要下载文件到文件系统,可以使用CasperJS中提供的下载功能。

If you explicitly want to download file to file system, you can use download function which is available in CasperJS.

这篇关于如何使用PhantomJS下载csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 16:54