本文介绍了Json到node.js中的csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在node.js中将非常大的json转换为csv,但是这花费了太多时间,并且在转换时也导致了100%的cpu.

I am trying to convert a very large json to csv in node.js but it is taking too much time and also leading to 100% cpu at the time of conversion.

  jsonToCsv: function (data) {
    var keys = Object.keys(data[0]);
    var csv = [keys.join(",")];
    console.time("CSVGeneration");
    data.forEach(function (row) {
      var line = '';
      keys.forEach(function (key) {
        if (typeof row[key] === 'string') {
          row[key] = "" + file_utils.escapeCsv(row[key]) + "";
        }
        line += row[key] + ",";
      });
      csv.push(line);
    });
    console.timeEnd("CSVGeneration");
    csv = csv.join("\n");
    return csv;
  },
  escapeCsv: function (x) {
    if (x)
      return ('' + x.replace(/"/g, '').replace(/,/g, ' ').replace(/\n/g, " ").replace(/\r/g, " ") + '');
    else
      return ('');
  },

在平均运行1Lac行的情况下,它从未恢复到均匀的日志时间.我不得不手动终止该过程.

On an average run for 1Lac rows, it never recovered to even log time. I had to kill the process manually.

有人可以建议一个更好的替代方法吗?

Can someone suggest a better alternative to this?

推荐答案

在回答此问题之前:假设您的代码有效,则此问题属于 https: //codereview.stackexchange.com/.

Before answering this: Assuming your code is working, this question belongs to https://codereview.stackexchange.com/ .

关于您的问题:

  • forEach()这样的新数组访问功能虽然在编码时相当舒适,但通常性能不高.在性能要求严格的情况下,简单的for循环是更好的选择.
  • escapeCsv()中,
  • 您仅对一个字符应用4个不同的正则表达式替换.将它们合并为一个.
  • 假设您的数据已经过某种结构化,可以进行Csv转换(data是对象数组,每个对象具有相同的属性),则不必为每个对象分别检索键.
  • li>
  • the new Array access functions like forEach(), while being rather comfortable when coding, are usually not quite performant. A simple for loop is the better choice in performance critical situations.
  • in escapeCsv() you apply 4 different regex replacements each for just one character. Combine those into one.
  • Assuming you data is already structured in a way, that allows for Csv conversion (data is an Array of objects, each having the same properties), it is not necessary to retrieve the keys individually for each object.

应用此代码,将产生以下代码:

Applying this, yields the following code:

function escapeCsv(x) {
    if (x) {
        return ('' + x).replace( /[",\n\r]/gi, '' );
    } else {
        return ('');
    }
}

function jsonToCsv(data) {
    var keys = Object.keys(data[0]),
        csv = [keys.join(",")];

    var row = new Array( keys.length );
    for (var i = 0; i < data.length; i++) {
        for (var j = 0; j < keys.length; j++) {
            if (typeof data[i][keys[j]] === 'string') {
                row[j] = '"' + escapeCsv(data[i][keys[j]]) + '"';
            } else {
                row[j] = data[i][keys[j]] || '';
            }
        }
        csv.push(row.join(','));
    }

    return csv.join("\n");
}

根据jsPerf,仅此一项就可以将性能提高3-5倍.

This alone yields a performance improvement for about 3-5 according to jsPerf.

如果您要生成的CSV可以直接流传输到文件或客户端,则可以进一步提高效率并减少内存负载,因为不必将CSV存储在内存中.

If the CSV, you are generating can be streamed to a file or to a client directly, one could improve even more and reduce the memory load, as not the CSV has to be stored in memory.

> 用小提琴来玩这些功能 带有后缀2的名称.

jsPerf.com比较

这篇关于Json到node.js中的csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 06:02