问题描述
我正在从网络上抓取数据,并且我的抓取器有多个并行运行的进程.
I'm scraping data from the web, and I have several processes of my scraper running in parallel.
我希望每个进程的输出都在同一个文件中.只要文本行保持完整并且不会相互混淆,行的顺序无关紧要.在 UNIX 中,我可以使用 >> 操作符将每个进程的输出通过管道传送到同一个文件吗?
I want the output of each of these processes to end up in the same file. As long as lines of text remain intact and don't get mixed up with each other, the order of the lines does not matter. In UNIX, can I just pipe the output of each process to the same file using the >> operator?
推荐答案
没有.不保证线路将保持完整.它们可以混合在一起.
No. It is not guaranteed that lines will remain intact. They can become intermingled.
根据 liori 的回答进行搜索,我找到了this:
From searching based on liori's answer I found this:
{PIPE_BUF} 字节或更少字节的写入请求不应与在同一管道上进行写入的其他进程的数据交错.大于 {PIPE_BUF} 字节的写入可能会在任意边界上将数据与其他进程的写入交错,无论是否设置了文件状态标志的 O_NONBLOCK 标志.
所以超过 {PIPE_BUF} 字节的行不能保证保持完整.
So lines longer than {PIPE_BUF} bytes are not guaranteed to remain intact.
这篇关于使用 >> 将多个并行进程的输出通过管道传输到一个文件是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!