本文介绍了我应该在哪里放ANNOTATE_ITERATION_TASK?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我正在使用Intel Advisor分析我的并行应用程序。我有这段代码,这是我程序的主循环,大部分时间都花在了这里:I'm using Intel Advisor to analyze my parallel application. I have this code, which is the main loop of my program and where is spent most of the time: for(size_t i=0; i<wrapperIndexes.size(); i++){ const int r = wrapperIndexes[i].r; const int c = wrapperIndexes[i].c; const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c); if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) || (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ) // either positive -> local max. or negative -> local min. ANNOTATE_ITERATION_TASK(localizeKeypoint); localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]); }如您所见, localizeKeypoint 是大部分时间用于循环的地方(如果您不考虑 if 子句)。我想做一个适合性报告,以估计通过并行化上述循环获得的收益。所以我这样写:As you can see, localizeKeypoint is where most of the time the loop is spent (if you don't consider the if clause). I want to do a Suitability Report to estimate the gain from parallelizing the loop above. So I've written this: ANNOTATE_SITE_BEGIN(solve); for(size_t i=0; i<wrapperIndexes.size(); i++){ const int r = wrapperIndexes[i].r; const int c = wrapperIndexes[i].c; const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c); if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) || (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ) // either positive -> local max. or negative -> local min. ANNOTATE_ITERATION_TASK(localizeKeypoint); localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]); } ANNOTATE_SITE_END();适应性报告获得了出色的6.69倍增益,如您在此处看到的:And the Suitability Report given an excellent 6.69x gain, as you can see here:但是,启动依赖项检查时,出现以下问题消息:However, launching dependencies check, I got this problem message:特别是请参阅缺少启动任务In particular see "Missing start task".此外,如果我在循环开始时放置 ANNOTATE_ITERATION_TASK ,例如:In addition, if I place ANNOTATE_ITERATION_TASK at the beggining of the loop, like this: ANNOTATE_SITE_BEGIN(solve); for(size_t i=0; i<wrapperIndexes.size(); i++){ ANNOTATE_ITERATION_TASK(localizeKeypoint); const int r = wrapperIndexes[i].r; const int c = wrapperIndexes[i].c; const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c); if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) || (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ) // either positive -> local max. or negative -> local min. localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]); } ANNOTATE_SITE_END();收益是可怕的:我做错了吗?INTEL_OPT=-O3 -simd -xCORE-AVX2 -parallel -qopenmp -fargument-noalias -ansi-alias -no-prec-div -fp-model fast=2INTEL_PROFILE=-g -qopt-report=5 -Bdynamic -shared-intel -debug inline-debug-info -qopenmp-link dynamic -parallel-source-info=2 -ldl推荐答案您必须使用第二种方法,将ANNOTATE_ITERATION_TASK放在循环注释的最开始。否则,您会(a)在适用性中预测错误的性能,(b)在正确性中缺少启动任务。You have to use second approach, where you put ANNOTATE_ITERATION_TASK at the very beginning of loop annotation. Otherwise you get (a) wrong performance projection in Suitability, (b) Missing Start task in Correctness.如果对第二个变量运行正确性(将迭代任务放在此处)If you run Correctness for the second variant (where you put iteration task at the very beginning of loop body), then Correctness should be OK.您的第二个适应性图表并不可怕。它只是说您必须注意任务分块(单击工具中的块链接以了解更多信息)。幸运的是,在默认情况下,新的OpenMP分块默认为足够好,请参见 https ://software.intel.com/zh-CN/articles/openmp-loop-scheduling 。因此,为了看到带有分块打开的Advisor投影,您只需要打开相应的复选框就可以了。Your second Suitability chart is not horrible. It just says that you have to take care about task chunking (click on the "chunking" link in the tool to learn more about it). Fortunately, in fresh OpenMP chunking is "good enough" by default, see https://software.intel.com/en-us/articles/openmp-loop-scheduling . So in order to see the Advisor projection with chunking ON, you just need to switch ON corresponding check-box and it will not be that bad. 这篇关于我应该在哪里放ANNOTATE_ITERATION_TASK?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云! 06-24 13:22