如何与OpenMP并行读取每四行.gz文件？

本文介绍了如何与OpenMP并行读取每四行.gz文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 test.fa.gz文件包含多个4行： @ HWI-ST298：420：B08APABXX：3 1：N：0：TCATTC GGCAAGGCACTTACTTTACAGCTAAAGAAGTGCAGC + @@ FDFFDFHCFDACGHC 我想要做的是每四与OpenMP并行的* .fq.gz文件行。代码打击可以编译成功，但有时会显示不正确的结果。在每个循环中，我使用了4次getline（）来读取文件。我不确定OpenMP如何处理每个循环中的多个作业，以及.gz文件句柄如何在OpenMP循环之间移动。互联网和OpenMP文件的帮助，但还是不太明白。所以任何帮助将不胜感激。谢谢， #include< iostream> #include< string> #include< cstdlib> #include< gzstream.h> #include< omp.h> 使用namespace std; string reverseStrand（string seq）; $ b $ int main（int argc，char ** argv）{ const char * gzFqFile; unsigned int nReads; if（argc == 3）{ gzFqFile = argv [1]; nReads = atoi（argv [2]）; } else { printf（\\\％s< * .fq.gz>< number_of_reads> \\\，argv [0]）; 返回1; } igzstream gz（gzFqFile）; string li，bp36，strand，revBp36; unsigned int i; #pragma omp parallel shared（gz）private（i，li，bp36，strand，revBp36） { #pragma omp for schedule（dynamic） for（i = 0 ; i< nReads; ++ i）{ li =; bp36 =; strand =; revBp36 =; getline（gz，li，'\\\'）; getline（gz，li，'\\\'）; bp36 = li; getline（gz，li，'\\\'）; strand = li; getline（gz，li，'\\\'）; if（strand.compare（ - ）== 0）{ revBp36 = reverseStrand（bp36）; } cout<< bp36<< <<链 << revBp36<< \\\; } } gz.close（）; 解决方案一个答案也许，但在这里去... 即使 getline 是线程安全的，这可能不是一个好主意在OpenMP程序中有多个线程都试图同时读取同一个文件。除非你有一个并行文件系统（因为你没有提到它，我假设你没有这么做），否则你将面临编写一个程序的风险，在这个程序中，线程为单个I / O通道相互争夺。考虑4个线程的情况，每个线程读取文件的不同部分，全部使用磁盘上的1个读/写头。准随机读取文件的小部分可能是您能想到的最慢的方法。 Haatschi的建议是，在关键部分封装文件访问这意味着不是为了I / O访问而战，线程很好地一起玩，每个人都有礼貌地等待。但是，正如Haatschi所言，这不太可能导致文件阅读速度的加快，更有可能（以我的经验）导致减速。如果I / O时间不重要，这可能是一种方法。如果您关心I / O时间，则可以在一个线程中读取文件并将其并行化数据的处理;或者让每个线程都从文件中读取所有的数据，使用临界区来避免争用I / O资源test.fa.gz file contains multiple 4 lines as blow:@HWI-ST298:420:B08APABXX:3:1101:1244:2212 1:N:0:TCATTCGGCAAGGCACTTACTTTACAGCTAAAGAAGTGCAGC+@@@FDFFDFHCFDACGHC<<CCFEHHFCCFCEE:C?What I want to do is to read every four lines of *.fq.gz file in parallel with OpenMP. The code blow could be compiled successfully, but will show incorrect results sometimes. In each for loop, I used 4 times of getline() to read the file. I'm not sure how OpenMP will handle the multiple jobs in each for loop and how the .gz file handle will move between for loops of OpenMP.I've searched internet and OpenMP documents for help, but still don't quite get it. So any help will be appreciated.Thanks,#include <iostream>#include <string>#include <cstdlib>#include <gzstream.h>#include <omp.h>using namespace std;string reverseStrand (string seq);int main (int argc, char ** argv) { const char* gzFqFile; unsigned int nReads; if (argc == 3) { gzFqFile = argv[1]; nReads = atoi(argv[2]); } else { printf("\n%s <*.fq.gz> <number_of_reads>\n", argv[0]); return 1; } igzstream gz(gzFqFile); string li, bp36, strand, revBp36; unsigned int i; #pragma omp parallel shared(gz) private(i,li,bp36,strand,revBp36) { #pragma omp for schedule(dynamic) for(i = 0;i < nReads;++i) { li = ""; bp36 = ""; strand = ""; revBp36 = ""; getline(gz,li,'\n'); getline(gz,li,'\n'); bp36 = li; getline(gz,li,'\n'); strand = li; getline(gz,li,'\n'); if(strand.compare("-") == 0) { revBp36 = reverseStrand(bp36); } cout << bp36 << " " << strand << " " << revBp36 << "\n"; } } gz.close();} 解决方案 More of an extended comment than an answer perhaps but here goes anyway ...Even if getline were thread safe it's probably not a good idea to have multiple threads in an OpenMP program all trying to read the same file simultaneously. Unless you have a parallel file system (since you don't mention it I assume you don't) you run the risk of writing a program in which the threads fight each other for the single I/O channel. Consider the case of 4 threads each reading different parts of a file all using 1 read/write head on a disk. Quasi-random reading of small bits of a file is probably the slowest approach you could think of.Haatschi's suggestion, of wrapping the file access in a critical section, will simply mean that instead of fighting for I/O access the threads play nicely together, each waiting politely for its turn. But, as Haatschi suggests, this is not likely to lead to any speedup in file reading, more likely (in my experience) to lead to a slow down. If I/O time is not critical this might be a way to go.If you are concerned with I/O time then either read the file in one thread and parallelise the processing of the data; or, have each of the threads read all their data in one gulp from the file, using critical sections to avoid contention for I/O resources 这篇关于如何与OpenMP并行读取每四行.gz文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！