AllReduce的奇怪结果为16字节实数

AllReduce的奇怪结果为16字节实数

本文介绍了MPI_AllReduce的奇怪结果为16字节实数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译器:gfortran-4.8.5

Compiler: gfortran-4.8.5

MPI库:OpenMPI-1.7.2(预装了OpenSuSE 13.2)

MPI library: OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2)

该程序:

  use mpi
  implicit none

  real*16 :: x
  integer :: ierr, irank, type16

  call MPI_Init(ierr)

  call MPI_Comm_Rank(MPI_Comm_World, irank, ierr)

  if (irank+1==1) x = 2.1
  if (irank+1==8) x = 2.8
  if (irank+1==7) x = 5.2
  if (irank+1==4) x = 6.7
  if (irank+1==6) x = 6.5
  if (irank+1==3) x = 5.7
  if (irank+1==2) x = 4.0
  if (irank+1==5) x = 6.8

  print '(a,i0,a,f3.1)', "rank+1: ",irank+1," x: ",x

  call MPI_AllReduce(MPI_IN_PLACE, x, 1, MPI_REAL16, MPI_MAX, MPI_Comm_World, ierr)

  if (irank==0) print '(i0,a,f3.1)', irank+1," max x: ", x

  call MPI_Finalize(ierr)
end

我也尝试过real(16)real(kind(1.q0)).对于该编译器,real(real128)实际上与real*10等效.

I also tried real(16), real(kind(1.q0)). real(real128) is actually equivalent with real*10 for this compiler.

结果是:

> mpif90 reduce16.f90
> mpirun -n 8 ./a.out
rank+1: 1 x: 2.1
rank+1: 2 x: 4.0
rank+1: 3 x: 5.7
rank+1: 4 x: 6.7
rank+1: 5 x: 6.8
rank+1: 6 x: 6.5
rank+1: 7 x: 5.2
rank+1: 8 x: 2.8
1 max x: 2.8

程序找到real*10并保持MPI_REAL16的真实最大值.如果MPI_REAL16对应于real*16real(real128)(如果它们不同),则MPI规范(3.1,第628和674页)不是很清楚.

The program finds the true maximum for real*10 keeping MPI_REAL16. The MPI specification (3.1, pages 628 and 674) is not very clear if MPI_REAL16 corresponds to real*16 or real(real128) if these differ.

此外,假设MPI_REAL16实际上是real(real128),并尝试在程序中使用它会导致另一个问题:

Also, assuming MPI_REAL16 is actually real(real128) and trying to use that in a program leads to a different problem:

Error: There is no specific subroutine for the generic 'mpi_recv' at (1)
Error: There is no specific subroutine for the generic 'mpi_send' at (1)

real*16不会发生.(不管一个人应该能够通过任何位模式,因此此检查都是多余的)

which does not happen for real*16.(disregarding that one should be able to pass any bit pattern, so this check is superfluous)

使用16字节实数的正确方法是什么? OpenMPI库有错误吗?

What is the right way to use 16 byte reals? Is the OpenMPI library in error?

推荐答案

虽然这应该在每个MPI实现中都可以正常工作,但是一个直接的解决方法是为用Fortran编写的这种类型实现用户定义的归约,因此用C来实现它没有问题(这是MPICH和OpenMPI尝试做所有事情的方式,因此当C无法重现Fortran的行为时就存在问题.)

While this should just work correctly in every MPI implementation, a straightforward workaround is to implement a user-defined reduction for this type that is written in Fortran, so there are no issues with implementing it in C (this is how MPICH and OpenMPI try to do everything, hence there are issues when C cannot reproduce the behavior of Fortran).

以下是尝试实现此目的的方法.这是Fortran中用户定义的减少量.我相信经验丰富的现代Fortran程序员可以做得更好.

Below is an attempt to implement this. This is the user-defined reduction in Fortran. I am certain that experienced modern Fortran programmers can do it better.

  subroutine sum_real16(iv,iov,n)
    implicit none
    integer, intent(in) ::  n
    real*16, intent(in) :: iv(:)
    real*16, intent(inout) :: iov(:)
    integer :: i
    do i = 1,n
      iov(i) = iov(i) + iv(i)
    enddo
  end subroutine sum_real16
  subroutine reduce_sum_real16(iv, iov, n, dt)
    use, intrinsic ::  iso_c_binding, only : c_ptr
    use mpi_f08
    implicit none
    type(c_ptr), value ::  iv, iov
    integer ::  n
    type(MPI_Datatype) ::  dt
    if ( dt .eq. MPI_REAL16 ) then
        call sum_real16(iv,iov,n)
    endif
  end subroutine reduce_sum_real16
  program test_reduce_sum_real16
    use, intrinsic ::  iso_c_binding
    use mpi_f08
    implicit none
    integer, parameter ::  n = 10
    real*16 :: output(n)
    real*16 :: input(n)
    real*16 :: error
    integer :: me, np
    procedure(MPI_User_function) :: reduce_sum_real16
    type(MPI_Op) :: mysum
    integer :: i
    call MPI_Init()
    call MPI_Comm_rank(MPI_COMM_WORLD,me)
    call MPI_Comm_size(MPI_COMM_WORLD,np)
    output = 0.0
    input  = 1.0*me
    call MPI_Op_create(reduce_sum_real16,.true.,mysum)
    call MPI_Allreduce(input,output,n,MPI_REAL16,mysum,MPI_COMM_WORLD)
    error = 0.0
    do i = 1,n
      error = error + (output(i)-1.0*np)
    enddo
    if (error.gt.0.0) then
        print*,'SAD PANDA = ',error
        call MPI_Abort(MPI_COMM_SELF,1)
    endif
    call MPI_Op_free(mysum)
    call MPI_Finalize()
  end program test_reduce_sum_real16

该程序随Intel 16 Fortran编译器和MPICH 3.2+一起返回而没有错误.显然我虽然没有正确使用I/O,所以我对程序的正确性的信心不如将所有结果写入stdout的信心.

This program returns without error with Intel 16 Fortran compiler and MPICH 3.2+. Apparently I am not using I/O correctly though, so my confidence in the correctness of this program is not as high as it would be if I could write all the results to stdout.

这篇关于MPI_AllReduce的奇怪结果为16字节实数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 00:23