本文介绍了使用OpenMP进行归纳:获取OpenMP中并行的for循环的范围值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道一种在带有C ++的OpenMP中的for循环中获取给定线程的值范围的方法.例如,在下面的代码中,我想知道每个线程在每个线程的循环中使用的第一个值.

I would like to know a way to get the range of values for a given thread in a parallized for loop in OpenMP with C++. For example in the following code I would like to know what the first value each thread uses in the loop for each thread.

#pragma omp parallel for schedule(static)
for(int i=0; i<n; i++)

让我举个例子说明为什么我可能需要这些值.假设我想用计数值的总和填充一个数组.计数总数之和的封闭式解决方案是n*(n+1)/2.为此,我可以执行以下操作:

Let me give you an example of why I might want these values. Let's assume I want to fill an array with the sum of the counting numbers. The closed form solution for the sum of the counting number is n*(n+1)/2. To do this with OpenMP I could do this:

#pragma omp parallel for schedule(static)
for(int i=0; i<n; i++) {
    a[i] = i*(i+1)/2;
}

但是,我怀疑一种获取计数总和的更快方法是,不要在每次迭代(都有一个平方)时都不使用封闭形式的解,而应该记住每次迭代的总和是这样的:

However, I suspect a faster method to get the sum of the counting numbers is to not use the closed form solution each iteration (which has a square) and instead remember the sum each iteration like this:

int cnt = 0;
for(int i=0; i<n; i++) {
    cnt += i;
    a[i] = cnt;
}

但是我能想到的使用OpenMP做到这一点的唯一方法就是像这样明确定义范围值:

But the only way to do this with OpenMP I can think of is explictly define the range values like this:

#pragma omp parallel
{
    const int ithread = omp_get_thread_num();
    const int nthreads = omp_get_num_threads();
    const int start = ithread*n/nthreads;
    const int finish = (ithread+1)*n/nthreads;

    int cnt = 0;
    int offset = (start-1)*(start)/2;
    for(int i=start; i<finish; i++) {
        cnt += i;
        a[i] = cnt + offset;
    }
}

如果可以从#pragma omp parallel for schedule(static)获取起始值,则不必定义start, finish, ithread, and nthreads.

If I could get the start value from #pragma omp parallel for schedule(static) then I would not have to define start, finish, ithread, and nthreads.

阅读 Agner Fog的Optimizing C ++ 手册后,我意识到我在做的事情叫做归纳法.他举了一个使用归纳法更有效地计算多项式值的示例.这是他的手册中的一些例子

After reading Agner Fog's Optimizing C++ manual I realized that what I am doing is called induction.He gives an example of using induction to more efficiently calculate the values of a polynominal. Here are some examples from his manual

没有归纳法:

// Example 8.23a. Loop to make table of polynomial
const double A = 1.1, B = 2.2, C = 3.3; // Polynomial coefficients
double Table[100]; // Table
int x; // Loop counter
for (x = 0; x < 100; x++) {
    Table[x] = A*x*x + B*x + C; // Calculate polynomial

带有感应:

// Example 8.23b. Calculate polynomial with induction variables
const double A = 1.1, B = 2.2, C = 3.3; // Polynomial coefficients
double Table[100]; // Table
int x; // Loop counter
const double A2 = A + A; // = 2*A
double Y = C; // = A*x*x + B*x + C
double Z = A + B; // = Delta Y
for (x = 0; x < 100; x++) {
    Table[x] = Y; // Store result
    Y += Z; // Update induction variable Y
    Z += A2; // Update induction variable Z
}

要使用OpenMP做到这一点,我需要获取每个块的起始值.使用OpenMP执行此操作的唯一方法是手动定义块.

To do this with OpenMP I need to get the start value for each chunk. The only way to do this with OpenMP is to define the chunks manually.

推荐答案

这是扩展注释,而不是答案...

This is an extended comment rather than an answer ...

没有用于获取每个线程将执行的i值范围的OpenMP例程或预定义变量(在您的情况下).您必须按照概述的方式写点东西,以便自己获得这些数字.

There is no OpenMP routine or pre-defined variable for getting the range of values for i (in your case) that each thread will execute. You'll have to write something along the lines that you have outlined to get those numbers yourself.

但是在您这样做之前,请停下来思考一下.所有这些额外的代码,以及编写和维护它的工作,只是避免每次迭代一次乘法!即使当您的代码正常工作时,我也怀疑您看到的任何加速是否值得付出努力.更糟糕的是,一旦您想使用与static不同的时间表,您将不得不重新进行索引计算.对于许多其他调度选项,一个线程执行的迭代无论如何都不是一个简单的范围.

But before you do, stop and think a bit. All that extra code, and the effort to write and to maintain it, just to avoid one multiplication per iteration ! Even when you get your code working I doubt that any speedup you see will be worth the effort. Worse, as soon as you want to use a different schedule than static you will have to re-do the index calculations; for many of the other scheduling options the iterations executed by one thread won't be a simple range anyway.

您不仅要严格按照OpenMP进行编程,而且通常也可能要进行并行编程.可以在不考虑运行时可用数量或运行时系统如何划分工作的情况下将其分发给线程的程序,并且这些程序在任务之间没有依赖性,因此它们是并行化的理想选择.它们通常无需大量程序员即可为大量线程提供良好的可伸缩性.

You are programming against the grain, not only of OpenMP, but probably of parallel programming in general. Programs which can be handed out to threads without consideration of the number available at run time or how the run-time system will divide up the work and which do not have dependencies between tasks are ideal for parallelisation. They generally provide good scalability to large numbers of threads without a great deal of programmer effort.

您所需要的只是封闭式解决方案.顺其自然.针对谷物进行编程(不可避免地会引起争论)将产生更加复杂的代码,这些代码难以维护,并且几乎不会产生并行加速来补偿其成本.

The closed form solution you already have is all you need. Go with the flow. Programming against the grain will (inevitably I would argue) produce more complicated code which is difficult to maintain and which will rarely produce parallel speedups to compensate for their costs.

这篇关于使用OpenMP进行归纳:获取OpenMP中并行的for循环的范围值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 06:24