问题描述
此代码:
#include <mpi.h>
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
for (unsigned int iter = 0 ; iter < 1000 ; iter++)
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
与MPICH 3.1.4
一起运行非常长.这是用于不同MPI实现的挂钟(以秒为单位).
is very long to run with MPICH 3.1.4
. Here are the wall clock (in seconds) for different MPI implementations.
在具有2个cpu内核的4个处理器的笔记本电脑上:
On a laptop with 4 processors of 2 cpu cores:
| MPI size | MPICH 1.4.1p1 | openmpi 1.8.4 | MPICH 3.1.4 |
|----------|---------------|---------------|-------------|
| 2 | 0.01 | 0.39 | 0.01 |
| 4 | 0.02 | 0.39 | 0.01 |
| 8 | 0.14 | 0.45 | 27.28 |
| 16 | 0.34 | 0.53 | 71.56 |
在具有8个处理器,4个cpu内核的台式机上:
On a desktop with 8 processors of 4 cpu cores:
| MPI size | MPICH 1.4.1p1 | openmpi 1.8.4 | MPICH 3.1.4 |
|----------|---------------|---------------|-------------|
| 2 | 0.00 | 0.41 | 0.00 |
| 4 | 0.01 | 0.41 | 0.01 |
| 8 | 0.07 | 0.45 | 2.57 |
| 16 | 0.36 | 0.54 | 61.76 |
是什么解释了这种差异,以及如何控制这种差异?
What explain such a difference, and how to control it?
推荐答案
您正在使用MPI size
>可用的处理器数量.由于MPI程序的产生方式是每个进程都由一个处理器处理,因此这意味着,例如,当您在8核计算机上运行MPI size == 16
时,每个处理器将负责两个进程.这不会使程序变快,实际上,正如您所见,它将使程序变慢.解决该问题的方法是使计算机具有更多可用处理器,或者确保您使用MPI size
< =可用处理器数量运行代码.
You are using MPI size
> number of processors available. As MPI programs spawn in such a way that each process is handled by a single processor, what this means is that, for example when you run MPI size == 16
on your 8 core machine, each processor will be responsible for two processes; this will not make the program any faster, and, in fact, will make it slower as you have seen. The way to get around it is to either get a machine with more processors available, or to ensure that you run your code with MPI size
<= number of processors available.
这篇关于MPI_Barrier执行什么控制时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!