问题描述
我必须将8192x8192矩阵读取到内存中.我想尽快完成.
现在我有这个结构:
I have to read a 8192x8192 matrix into memory. I want to do it as fast as possible.
Right now I have this structure:
char inputFile[8192][8192*4]; // I know the numbers are at max 3 digits
int8_t matrix[8192][8192]; // Matrix to be populated
// Read entire file line by line using fgets
while (fgets (inputFile[lineNum++], MAXCOLS, fp));
//Populate the matrix in parallel,
for (t = 0; t < NUM_THREADS; t++){
pthread_create(&threads[t], NULL, ParallelRead, (void *)t);
}
在函数ParallelRead
中,我解析每一行,执行atoi
并填充矩阵.并行性是逐行的,就像线程t解析行t, t+ 1 * NUM_THREADS..
In the function ParallelRead
, I parse each line, do atoi
and populate the matrix. The parallelism is line-wise like thread t parses line t, t+ 1 * NUM_THREADS..
在具有2个线程的两核系统上,这需要
On a two-core system with 2 threads, this takes
Loading big file (fgets) : 5.79126
Preprocessing data (Parallel Read) : 4.44083
是否有进一步优化此方法的方法?
Is there a way to optimize this any further?
推荐答案
以这种方式进行操作不是一个好主意.如果具有足够多的内核,但线程仍然只有一个硬盘,则线程可以获得更多的cpu周期.因此线程不可避免地不能提高读取文件数据的速度.
It's a bad idea to do it this way. Threads can get your more cpu cycles if you have enough cores but you still have only one hard disk. So inevitably threads cannot improve the speed of reading file data.
他们实际上使情况变得更糟.当您顺序访问文件时,从文件中读取数据最快.这样可以最大程度地减少读取器磁头查找的次数,这是迄今为止磁盘驱动器上最昂贵的操作.通过将读数分成多个线程,每个线程读取文件的不同部分,您将使读者的头不断来回跳动.吞吐量非常非常差.
They actually make it much worse. Reading data from a file is fastest when you access the file sequentially. That minimizes the number of reader head seeks, by far the most expensive operation on a disk drive. By splitting the reading across multiple threads, each reading a different part of the file, you are making the reader head constantly jump back and forth. Very, very bad for throughput.
仅使用一个线程读取文件数据.一旦加载了一部分文件数据,就可以通过启动线程来使其与文件数据上的某些计算周期重叠.
Use only one thread to read file data. You might be able to overlap it with some computational cycles on the file data by starting a thread once a chunk of the file data is loaded.
Do 注意测试效果.当您重新运行程序时,通常是在稍微调整代码后,程序很可能可以在文件系统缓存中找到文件数据,因此不必从磁盘读取.内存总线速度,内存到内存副本的速度非常快.由于它不是很大,很容易适应现代计算机所拥有的RAM量,因此很可能出现在您的数据集上.这(通常)不会在生产机器上发生.因此,请务必清除缓存以获取实际数字,无论它在您的OS上花费多少.
Do watch out for the test effect. When you re-run your program, typically after tweaking your code somewhat, it is likely that the program can find file data back in the file system cache so it doesn't have to be read from the disk. That's very fast, memory bus speed, a memory-to-memory copy. Pretty likely on your dataset since it isn't very big and easily fits in the amount of RAM a modern machine has. This does not (typically) happen on a production machine. So be sure to clear out the cache to get realistic numbers, whatever it takes on your OS.
这篇关于在多线程应用程序中最快的文件读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!