MPI_Reduce 与 (MPI_Gather + Reduction on Root) 的性能对比

本文介绍了MPI_Reduce 与 (MPI_Gather + Reduction on Root) 的性能对比的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用 MPICH2 库的 CRAY 超级计算机.每个节点有 32 个 CPU.

CRAY supercomputer using the MPICH2 library. Each node has 32 CPU's.

我在 N 个不同的 MPI 等级上有一个浮动，其中每个等级都在不同的节点上.我需要对这组浮点数执行归约操作.对于任何 N 值，我想知道 MPI_Reduce 是否比 MPI_Gather 更快，并且在根上计算了减少.请假设对根等级进行的减少将使用可以利用 N 个线程的良好并行减少算法来完成.

I have a single float on N different MPI ranks, where each of these ranks is on a different node. I need to perform a reduction operation on this group of floats. I would like to know whether an MPI_Reduce is faster than MPI_Gather with the reduction calculated on the root, for any value of N. Please assume that the reduction done on the root rank will be done using a good parallel reduction algorithm that can utilize N threads.

如果 N 的任何值都不是更快，那么对于较小的 N(例如 16)或较大的 N，它是否会趋于正确?

If it isn't faster for any value of N, would it tend to be true for smaller N, like 16, or larger N?

如果是真的，为什么?(例如，MPI_Reduce 是否会使用树通信模式，在它用于与树的下一级通信的方法中倾向于隐藏缩减操作的时间?)

If it is true, why? (For example, will MPI_Reduce use a tree communication pattern that tends to hide the reduction operation's time in the approach it uses to communicate with the next level of the tree?)

assume

MPI_Reduce 与 (MPI_Gather + Reduction on Root) 的性能对比

问题描述

推荐答案