问题描述
我想创建一个写流并在数据输入时对其进行写.但是,我能够创建该文件,但是没有写入任何内容.最终,该过程将耗尽内存.
I want to create a write stream and write to it as my data comes in. However, I am able to create the file but nothing is written to it. Eventually, the process runs out of memory.
我发现的问题是,我在循环内正在调用write().
The problem, I've discovered is that I'm calling write() whilst inside a loop.
这是一个简单的例子:
'use strict'
var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
for (var i = 0; i < 10000000000; i++) {
wstream.write(i+'\n');
}
console.log('End!')
wstream.end();
什么都没有写,甚至没有打招呼.但为什么?如何在循环中写入文件?
Nothing ever gets written, not even hello. But why? How can I write to the file within a loop?
推荐答案
为补充@MikeC的出色答案,这里有些相关 writable.write()
:
To supplement @MikeC's excellent answer, here are some relevant details from the current docs (v8.4.0) for writable.write()
:
当流不耗尽时,对write()
的调用将缓冲chunk
,并返回false
.一旦所有当前缓冲的块耗尽(由操作系统接受传递),将发出'drain'
事件.建议一旦write()
返回false
,在发出'drain'
事件之前,不要再写入任何块.在允许不排水的流上调用write()
时, Node.js将缓冲所有已写入的块,直到出现最大内存使用量为止,此时它将无条件中止.即使在中止之前,高内存使用率也会导致较差的垃圾收集器性能和较高的 RSS (不是通常会释放回系统,即使不再需要内存也是如此.
While a stream is not draining, calls to write()
will buffer chunk
, and return false
. Once all currently buffered chunks are drained (accepted for delivery by the operating system), the 'drain'
event will be emitted. It is recommended that once write()
returns false
, no more chunks be written until the 'drain'
event is emitted. While calling write()
on a stream that is not draining is allowed, Node.js will buffer all written chunks until maximum memory usage occurs, at which point it will abort unconditionally. Even before it aborts, high memory usage will cause poor garbage collector performance and high RSS (which is not typically released back to the system, even after the memory is no longer required).
以及流中的反压:
返回false
值时,反压系统将启动.
When a false
value is returned, the backpressure system kicks in.
一旦清空数据缓冲区,将发出.drain()
事件并恢复传入的数据流.
Once the data buffer is emptied, a .drain()
event will be emitted and resume the incoming data flow.
队列完成后,背压将允许再次发送数据.正在使用的内存空间将释放自己的空间,并为下一批数据做准备.
Once the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data.
+-------------------+ +=================+
| Writable Stream +---------> .write(chunk) |
+-------------------+ +=======+=========+
|
+------------------v---------+
+-> if (!chunk) | Is this chunk too big? |
| emit .end(); | Is the queue busy? |
+-> else +-------+----------------+---+
| emit .write(); | |
^ +--v---+ +---v---+
^-----------------------------------< No | | Yes |
+------+ +---v---+
|
emit .pause(); +=================+ |
^-----------------------+ return false; <-----+---+
+=================+ |
|
when queue is empty +============+ |
^-----------------------< Buffering | |
| |============| |
+> emit .drain(); | ^Buffer^ | |
+> emit .resume(); +------------+ |
| ^Buffer^ | |
+------------+ add chunk to queue |
| <---^---------------------<
+============+
这里有一些可视化效果(通过使用 --max-old-space-size=512
).
Here are some visualisations (running the script with a V8 heap memory size of 512MB by using --max-old-space-size=512
).
此可视化显示堆内存使用情况(红色)和i
的每10,000步的增量时间(紫色)(X轴显示i
):
This visualisation shows the heap memory usage (red) and delta time (purple) for every 10,000 steps of i
(the X axis shows i
):
'use strict'
var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
var latestTime = (new Date()).getTime();
var currentTime;
for (var i = 0; i < 10000000000; i++) {
wstream.write(i+'\n');
if (i % 10000 === 0) {
currentTime = (new Date()).getTime();
console.log([ // Output CSV data for visualisation
i,
(currentTime - latestTime) / 5,
process.memoryUsage().heapUsed / (1024 * 1024)
].join(','));
latestTime = currentTime;
}
}
console.log('End!')
wstream.end();
随着内存使用率接近512MB的最大限制,脚本运行的速度越来越慢,直到达到限制时它最终崩溃.
The script runs slower and slower as the memory usage approaches the maximum limit of 512MB, until it finally crashes when the limit is reached.
此可视化使用 v8.setFlagsFromString()
与 --trace_gc
到显示每个垃圾回收的当前内存使用量(红色)和执行时间(紫色)(X轴以秒为单位显示总耗用时间):
This visualisation uses v8.setFlagsFromString()
with --trace_gc
to show the current memory usage (red) and execution time (purple) of each garbage collection (the X axis shows total elapsed time in seconds):
'use strict'
var fs = require('fs');
var v8 = require('v8');
var wstream = fs.createWriteStream('myOutput.txt');
v8.setFlagsFromString('--trace_gc');
for (var i = 0; i < 10000000000; i++) {
wstream.write(i+'\n');
}
console.log('End!')
wstream.end();
大约4秒钟后,内存使用率达到80%,并且垃圾收集器放弃尝试Scavenge
并被迫使用Mark-sweep
(速度要慢10倍以上)–请参见这篇文章以获取更多详细信息.
Memory usage reaches 80% after about 4 seconds, and the garbage collector gives up trying to Scavenge
and is forced to use Mark-sweep
(more than 10 times slower) – see this article for more details.
为进行比较,这里是@MikeC代码的可视化效果,当write
缓冲区已满时,它们等待drain
:
For comparison, here are the same visualisations for @MikeC's code which waits for drain
when the write
buffer becomes full:
这篇关于节点:fs write()不在循环内写入.为什么不?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!