mpi4py:空闲内核显着放缓

本文介绍了mpi4py:空闲内核显着放缓的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 Python 脚本，用于招募 MPI 进行并行计算.计算方案如下:数据处理第 1 轮 - 进程之间的数据交换 - 数据处理第 2 轮.我有一台 16 逻辑核心机器(2 x Intel Xeon E5520 2.27GHz).由于某种原因，第 1 轮不能并行运行.因此，有 15 个内核处于空闲状态.然而，尽管如此，计算速度还是慢了 2 倍.

I have a python script that recruits MPI for parallel calculations. The scheme of the calculations is following: data processing round 1 - data exchange between processes - data processing round 2. I have a 16 logical core machine (2 x Intel Xeon E5520 2.27GHz). For a reason round 1 cannot be run in parallel. Therefore, 15 cores stay idle. However, despite this fact calculations experience more than 2-fold slowdown.

这个脚本说明了问题(另存为test.py):

The problem is illustrated by this script (saved as test.py):

from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()

if rank == 0:
    print('begin calculations at {:.3f}'.format(time.time() - stime))
    for i in range(1000000000):
        a = 2 * 2
    print('end calculations at {:.3f}'.format(time.time() - stime))
    comm.bcast(a, root = 0)
    print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
    a = comm.bcast(root = 0)

当我在 2 个内核上运行它时，我观察到:

When I run it on 2 cores, I observe:

$ mpiexec -n 2 python3 test.py
begin calculations at 0.000
end calculations at 86.954
end data exchange at 86.954

当我在 16 个内核上运行它时，我观察到:

When I run it on 16 cores, I observe:

$ mpiexec -n 16 python3 test.py
begin calculations at 0.000
end calculations at 174.156
end data exchange at 174.157

谁能解释这种差异?一个想法，如何摆脱它，也会很有用.

Can anyone explain such a difference? An Idea, how to get rid of it, would also be useful.

推荐答案

好的，我终于想通了.

有几个因素导致速度变慢:

There are several features contributing to the slowdown:

等待数据接收是活动的(它不断检查，如果数据已经到达)，这使得等待进程不再空闲.
英特尔虚拟内核对计算速度没有贡献.这意味着，8 核机器仍然是 8 核并且表现得像这样，与虚拟机器无关(在某些情况下，例如，当应用 multithreading 模块时，它们可以进行适度的提升，但不是MPI).

Waiting for data receiving is active (it checks constantly, if data already arrived), which makes waiting processes no more idle.
Intel virtual cores do not contribute to calculation speed. That means, 8 core machine is still 8 core and behaves like such, irrespective of virtual ones (in some cases, for example, when multithreading module is applied, they can make a modest boost, but not with MPI).

考虑到这一点，我修改了代码，将 sleep() 函数引入到等待进程中.结果显示在图表上(每种情况下进行 10 次测量).

Taking this into account, I modified code, introducing the sleep() function into the waiting processes. Results are represented on the chart (10 measurements were done in each case).

from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()

if rank == 0:
    for i in range(1000000000):
        a = 2 * 2
    print('end calculations at {:.3f}'.format(time.time() - stime))
    for i in range(1, size):
        comm.send(a, dest = i)
    print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
    while not comm.Iprobe(source = 0):
        time.sleep(1)
    a = comm.recv(source = 0)

这篇关于mpi4py:空闲内核显着放缓的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！