问题描述
我正在尝试在服务器上使用XML-> JSON-> MongoDB.我有一个NodeJS应用程序,该应用程序流化XML,将其转换为JSON,然后以1000的块将其添加到MongoDB服务器.但是,经过大约75000条记录后,我的Macbook风扇开始旋转得更快,并且处理过程真的很慢.几分钟后,出现此错误:
I am trying to XML -> JSON -> MongoDB on my server. I have a NodeJS application which streams the XML, converts it into JSON, then adds it to the MongoDB server in chunks of 1000s. However, after about 75000 records, my Macbook's fans starts spinning faster and the processing goes REALLY slow. After a few minutes, I get this error:
[30517:0x102801600] 698057 ms:标记扫描1408.2(1702.9)-> 1408.1(1667.4)MB,800.3/0.0 ms(+ 0.0 ms自标记开始以来的0步,最大步幅0.0 ms,自开始以来的时间标记803毫秒)不得已 [30517:0x102801600] 698940毫秒:标记扫描1408.1(1667.4)-> 1408.1(1667.4)MB,882.2/0.0毫秒不得已
[30517:0x102801600] 698057 ms: Mark-sweep 1408.2 (1702.9) -> 1408.1 (1667.4) MB, 800.3 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 803 ms) last resort [30517:0x102801600] 698940 ms: Mark-sweep 1408.1 (1667.4) -> 1408.1 (1667.4) MB, 882.2 / 0.0 ms last resort
最后是JS stacktrace:
and finally in the JS stacktrace:
我感觉我的内存快用完了,但是当文件超过70 GB且我只有16GB的RAM时,用--max-old-space-size
(或其他任何方法)增加允许的内存是行不通的.
I have a feeling my memory is running out, but increasing the allowed memory with --max-old-space-size
(or whatever) doesn't work when the file is 70+ gigabytes and I only have 16GB of RAM.
这是我要执行的操作的代码:
Here's the code of what I am trying to do:
var fs = require('fs'),
path = require('path'),
XmlStream = require('xml-stream'),
MongoClient = require('mongodb').MongoClient,
url = 'mongodb://username:[email protected]:27017/mydatabase',
amount = 0;
MongoClient.connect(url, function(err, db) {
var stream = fs.createReadStream(path.join(__dirname, 'motor.xml'));
var xml = new XmlStream(stream);
var docs = [];
xml.collect('ns:Statistik');
// This is your event for the element matches
xml.on('endElement: ns:Statistik', function(item) {
docs.push(item); // collect to array for insertMany
amount++;
if ( amount % 1000 === 0 ) {
xml.pause(); // pause the stream events
db.collection('vehicles').insertMany(docs, function(err, result) {
if (err) throw err;
docs = []; // clear the array
xml.resume(); // resume the stream events
});
}
});
// End stream handler - insert remaining and close connection
xml.on("end",function() {
if ( amount % 1000 !== 0 ) {
db.collection('vehicles').insertMany(docs, function(err, result) {
if (err) throw err;
db.close();
});
} else {
db.close();
}
});
});
我的问题是:我有内存泄漏吗?为什么Node允许代码那样建立内存?除了为我的PC购买70 GB以上的RAM外,还有其他解决方法吗?
My question is something like: Do I have a memory leak? Why does Node allow the code to build up the memory like that? Is there a fix besides buying 70+ GB of RAM for my PC?
推荐答案
将我的评论发布为答案,因为它解决了该问题,并且可能对其他难以使用xml-stream
软件包的人有用.
Posting my comment as an answer, since it solved the issue and might be useful to others having difficulting using the xml-stream
package in this way.
有问题的是,collect
方法引起了问题,因为它迫使解析器在解析它们时将其收集为数组中所有已处理节点的实例. collect
仅应用于从正在解析的每个节点中收集某种类型的子项.默认行为是不执行此操作(由于解析器的流式传输特性使您可以轻松处理数GB的文件).
In question, the collect
method is causing the issue as it is forcing the parser to collect all the instances of the processed node in an array as they are parsed. collect
should only be used to collect children items of a certain type from each node that is being parsed. The default behaviour is not to do that (due to the streaming nature of the parser that lets you process multi gigabyte files with ease).
因此解决方案是删除该行代码,而仅使用endElement
事件.
So solution was to remove that line of code and just use the endElement
event.
这篇关于"JavaScript堆内存不足";流大文件时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!