问题描述
在我的应用程序,我试图合并排序文件(保持他们排序当然),所以我必须通过每个元素在两个文件中写入最小的第三个。这对大文件工作相当慢,就我没有看到任何其他选择(迭代必须做)我试图优化文件加载。我可以使用一些数量的RAM,我可以使用缓冲。我的意思是,每次我可以读取一次像100Mb,然后使用那个缓冲区,直到在缓冲区中没有元素,然后我会重新填充缓冲区,从两个文件读取4个字节。但我想ifstream已经这样做,它会给我更多的性能,有什么原因吗?如果fstream是,也许我可以改变缓冲区的大小?
In my application I'm trying to merge sorted files (keeping them sorted of course), so I have to iterate through each element in both files to write the minimal to the third one. This works pretty much slow on big files, as far as I don't see any other choice (the iteration has to be done) I'm trying to optimize file loading. I can use some amount of RAM, which I can use for buffering. I mean instead of reading 4 bytes from both files every time I can read once something like 100Mb and work with that buffer after that, until there will be no element in buffer, then I'll refill the buffer again. But I guess ifstream is already doing that, will it give me more performance and is there any reason? If fstream does, maybe I can change size of that buffer?
添加
我的当前代码看起来像(伪代码)
My current code looks like that (pseudocode)
// this is done in loop
int i1 = input1.read_integer();
int i2 = input2.read_integer();
if (!input1.eof() && !input2.eof())
{
if (i1 < i2)
{
output.write(i1);
input2.seek_back(sizeof(int));
} else
input1.seek_back(sizeof(int));
output.write(i2);
}
} else {
if (input1.eof())
output.write(i2);
else if (input2.eof())
output.write(i1);
}
这里不喜欢的是
- seek_back - 我必须回到上一个位置,因为没有办法偷看4个字节
- / li>
- 如果其中一个流在EOF中,它仍然继续检查该流,而不是将另一个流的内容直接输出到输出,但这不是一个大问题,因为块大小几乎总是等于。
您可以建议改进吗?
谢谢。 / p>
Thanks.
推荐答案
在不涉及流缓冲区的讨论,你可以摆脱 seek_back
,通常使代码更简单:
Without getting into the discussion on stream buffers, you can get rid of the seek_back
and generally make the code much simpler by doing:
using namespace std;
merge(istream_iterator<int>(file1), istream_iterator<int>(),
istream_iterator<int>(file2), istream_iterator<int>(),
ostream_iterator<int>(cout));
编辑:
/ p>
Added binary capability
#include <algorithm>
#include <iterator>
#include <fstream>
#include <iostream>
struct BinInt
{
int value;
operator int() const { return value; }
friend std::istream& operator>>(std::istream& stream, BinInt& data)
{
return stream.read(reinterpret_cast<char*>(&data.value),sizeof(int));
}
};
int main()
{
std::ifstream file1("f1.txt");
std::ifstream file2("f2.txt");
std::merge(std::istream_iterator<BinInt>(file1), std::istream_iterator<BinInt>(),
std::istream_iterator<BinInt>(file2), std::istream_iterator<BinInt>(),
std::ostream_iterator<int>(std::cout));
}
这篇关于std :: ifstream缓冲区缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!