问题描述
我需要处理文件列表。不应为同一文件重复处理操作。我使用的代码是 -
using namespace std;
vector< File *> gInputFileList; //可以包含重复,文件有成员sFilename
map< string,File *> gProcessedFileList; // using map to avoid linear search costs
void processFile(File * pFile)
{
File * pProcessedFile = gProcessedFileList [pFile-> sFilename];
if(pProcessedFile!= NULL)
return; // Already processed
foo(pFile); // foo()是为每个文件执行的操作
gProcessedFileList [pFile-> sFilename] = pFile;
}
void main()
{
size_t n = gInputFileList.size(); //使用数组语法(迭代器语法也提供相同的性能)
for(size_t i = 0; i processFile(gInputFileList [i]);
}
}
代码可以正常工作, p>
我的问题是,当输入大小为1000时,在Windows / Visual Studio 2008 Express上需要30分钟 - 半小时。对于相同的输入,在Linux / gcc上运行只需要40秒!
可能是什么问题?当单独使用时,动作foo()只需要很短的时间来执行。我应该使用像vector :: reserve为地图吗?
$ b 1.它打开文件
2.将其读入内存
3.关闭文件
4.内存中的文件内容被解析
5.构建令牌列表;我正在使用一个向量。
每当我中断程序时(运行程序时使用1000+个文件输入集):调用堆栈显示程序在std中间: :vector add。
在Microsoft Visual Studio中,访问标准C ++库以防止多线程问题在Debug构建。这可能导致大的性能命中。例如,我们的完整测试代码在Linux / gcc上运行50分钟,而在Windows VC ++ 2008上需要5个小时。请注意,在使用非调试Visual C ++运行时在发布模式下编译时,此性能命中不存在。
I need to process a list of files. The processing action should not be repeated for the same file. The code I am using for this is -
using namespace std;
vector<File*> gInputFileList; //Can contain duplicates, File has member sFilename
map<string, File*> gProcessedFileList; //Using map to avoid linear search costs
void processFile(File* pFile)
{
File* pProcessedFile = gProcessedFileList[pFile->sFilename];
if(pProcessedFile != NULL)
return; //Already processed
foo(pFile); //foo() is the action to do for each file
gProcessedFileList[pFile->sFilename] = pFile;
}
void main()
{
size_t n= gInputFileList.size(); //Using array syntax (iterator syntax also gives identical performance)
for(size_t i=0; i<n; i++){
processFile(gInputFileList[i]);
}
}
The code works correctly, but...
My problem is that when the input size is 1000, it takes 30 minutes - HALF AN HOUR - on Windows/Visual Studio 2008 Express. For the same input, it takes only 40 seconds to run on Linux/gcc!
What could be the problem? The action foo() takes only a very short time to execute, when used separately. Should I be using something like vector::reserve for the map?
EDIT, EXTRA INFORMATION
What foo() does is:1. it opens the file2. reads it into memory3. closes the file4. the contents of the file in memory is parsed5. it builds a list of tokens; I'm using a vector for that.
Whenever I break the program (while running the program with the 1000+ files input set): the call-stack shows that the program is in the middle of a std::vector add.
In the Microsoft Visual Studio, there's a global lock when accessing the Standard C++ Library to protect from multi threading issue in Debug builds. This can cause big performance hits. For instance, our full test code runs on Linux/gcc in 50 minutes, whereas it needs 5 hours on Windows VC++2008. Note that this performance hit does not exist when compiling in Release mode, using the non-debug Visual C++ runtime.
这篇关于C ++映射性能 - Linux(30秒)和Windows(30分钟)!的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!