问题描述
我正在尝试在node.js中将非常大的json转换为csv,但是这花费了太多时间,并且在转换时也导致了100%的cpu.
I am trying to convert a very large json to csv in node.js but it is taking too much time and also leading to 100% cpu at the time of conversion.
jsonToCsv: function (data) {
var keys = Object.keys(data[0]);
var csv = [keys.join(",")];
console.time("CSVGeneration");
data.forEach(function (row) {
var line = '';
keys.forEach(function (key) {
if (typeof row[key] === 'string') {
row[key] = "" + file_utils.escapeCsv(row[key]) + "";
}
line += row[key] + ",";
});
csv.push(line);
});
console.timeEnd("CSVGeneration");
csv = csv.join("\n");
return csv;
},
escapeCsv: function (x) {
if (x)
return ('' + x.replace(/"/g, '').replace(/,/g, ' ').replace(/\n/g, " ").replace(/\r/g, " ") + '');
else
return ('');
},
在平均运行1Lac行的情况下,它从未恢复到均匀的日志时间.我不得不手动终止该过程.
On an average run for 1Lac rows, it never recovered to even log time. I had to kill the process manually.
有人可以建议一个更好的替代方法吗?
Can someone suggest a better alternative to this?
推荐答案
在回答此问题之前:假设您的代码有效,则此问题属于 https: //codereview.stackexchange.com/.
Before answering this: Assuming your code is working, this question belongs to https://codereview.stackexchange.com/ .
关于您的问题:
- 像
forEach()
这样的新数组访问功能虽然在编码时相当舒适,但通常性能不高.在性能要求严格的情况下,简单的for
循环是更好的选择. 在 - 您仅对一个字符应用4个不同的正则表达式替换.将它们合并为一个.
- 假设您的数据已经过某种结构化,可以进行Csv转换(
data
是对象数组,每个对象具有相同的属性),则不必为每个对象分别检索键. li>
escapeCsv()
中,- the new Array access functions like
forEach()
, while being rather comfortable when coding, are usually not quite performant. A simplefor
loop is the better choice in performance critical situations. - in
escapeCsv()
you apply 4 different regex replacements each for just one character. Combine those into one. - Assuming you data is already structured in a way, that allows for Csv conversion (
data
is an Array of objects, each having the same properties), it is not necessary to retrieve the keys individually for each object.
应用此代码,将产生以下代码:
Applying this, yields the following code:
function escapeCsv(x) {
if (x) {
return ('' + x).replace( /[",\n\r]/gi, '' );
} else {
return ('');
}
}
function jsonToCsv(data) {
var keys = Object.keys(data[0]),
csv = [keys.join(",")];
var row = new Array( keys.length );
for (var i = 0; i < data.length; i++) {
for (var j = 0; j < keys.length; j++) {
if (typeof data[i][keys[j]] === 'string') {
row[j] = '"' + escapeCsv(data[i][keys[j]]) + '"';
} else {
row[j] = data[i][keys[j]] || '';
}
}
csv.push(row.join(','));
}
return csv.join("\n");
}
根据jsPerf,仅此一项就可以将性能提高3-5倍.
This alone yields a performance improvement for about 3-5 according to jsPerf.
如果您要生成的CSV可以直接流传输到文件或客户端,则可以进一步提高效率并减少内存负载,因为不必将CSV存储在内存中.
If the CSV, you are generating can be streamed to a file or to a client directly, one could improve even more and reduce the memory load, as not the CSV has to be stored in memory.
> 用小提琴来玩这些功能 带有后缀2
的名称.
这篇关于Json到node.js中的csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!