问题描述
为了存储带状矩阵,整个对应的矩阵可以同时具有从<$ c $以外的索引索引的行和列c> 1 ,我定义了一个派生数据类型为 TYPE CDS
REAL, DIMENSION(:, :),ALLOCATABLE :: matrix
INTEGER,DIMENSION(2):: lb,ub
INTEGER :: ld,ud
结束类型CDS
其中CDS代表压缩的对角线存储。
给定声明 TYPE(CDS):: A
,
- rank-2组件
matrix
应该包含实际完整矩阵的对角线(如,除了我将对角线存储为列而不是行)。 - 组件
ld
和ud
应该分别包含下对角线和上对角线的数量,即-lbound(A%matrix,2)
和+ ubound(A%matrix,2)
。 - 2元素组件
lb
和ub
应该包含实际的下限和上限整个矩阵沿着两个维度。特别是lb(1)
和ub(1)
应该与lbound( A%matrix,1)
和lbound(A%matrix,2)
。
正如您在第2点和第3点中所看到的那样,派生类型包含一些冗余信息,但我不在乎,因为它们只是3对整数。此外,在我写的代码中,关于实际完整矩阵的边界和频带的信息在矩阵可以被填充之前就已知。因此,我首先将值分配给组件 问题 我必须在这些稀疏矩阵之间进行矩阵乘法,因此我编写了一个 目前该函数如下所示: ld
, ud
, lb
和 ub
,然后我将这些组件用于 ALLOCATE
矩阵
组件(然后我可以正确填写它)。
FUNCTION
来执行此类乘积并用它来重载 *
运营商。
pre > FUNCTION CDS_mat_x_CDS_mat(A,B)
IMPLICIT NONE
TYPE(CDS),INTENT(IN):: A,B
TYPE(CDS):: cds_mat_x_cds_mat
! (1)= A%ub(1)确定结果的下限和上限以及基于操作数的结果的上限和下限:
CDS_mat_x_CDS_mat%lb(1)= A%lb(1)
CDS_mat_x_CDS_mat% )
CDS_mat_x_CDS_mat%lb(2)= B%lb(2)
CDS_mat_x_CDS_mat%ub(2)= B%ub(2)
CDS_mat_x_CDS_mat%ld = A%ld + B%ld
CDS_mat_x_CDS_mat%ud = A%ud + B%ud
!分配矩阵分量
ALLOCATE(CDS_mat_x_CDS_mat%矩阵(CDS_mat_x_CDS_mat%lb(1):CDS_mat_x_CDS_mat%ub(1),&
& -CDS_mat_x_CDS_mat%ld:+ CDS_mat_x_CDS_mat%ud))
!执行产品
:
:
END FUNCTION
内部
对于如何完成带状稀疏矩阵时间带状稀疏矩阵的任务,我提出了一些建议。我想使用我定义的类型,因为我需要它像现在一样在边界方面一般。但是,如果需要,我可以更改执行产品的过程(从 FUNCTION
到 SUBROUTINE
)。
想法
我可以将程序改写为 SUBROUTINE
为了用 INTENT(INOUT)
声明 CDS_mat_x_CDS_mat
,分配组件除了 matrix
以外,还有 SUBROUTINE
之外的分配。缺点是我不能重载 *
运算符。
我注意到了内部函数 MATMUL
可以在任何rank-2操作数上运行,无论在两个维度上的上限还是下限。这意味着分配在函数内部执行。我认为它是有效的(因为它是一个内在的)。与我的函数的不同之处在于它接受任意形状的rank-2数组,我的接受派生数据类型对象具有任何形状的rank-2数组组件。
因此,没有Fortran语言ALLOCATABLE变量与内部MATMUL函数结果的等效关联。这与没有内存分配不同 - 编译器可能仍然需要为了自己的目的在场景后面分配内存 - 例如表达式临时对象等。
(我说上面的相当于,因为内在过程本质上是特殊的 - 但假装MATMUL只是一个用户函数。)
通过使用长度类型参数,可以实现与您的情况相同的这种自动结果。这是Fortran 2003的特性 - 引入了可分配组件的相同基本语言标准 - 但它并不是所有主动维护的编译器都已实现的。
MODULE xyz
IMPLICIT NONE
TYPE CDS(lb1,ub1,ld,ud)
INTEGER,LEN :: lb1,ub1,ld,ud
REAL ::矩阵(lb1:ub1,ld:ud)
INTEGER :: lb2,ub2
结束类型CDS
接口操作符(*)
模块过程CDS_mat_x_CDS_mat
END INTERFACE OPERATOR(*)
包含
函数CDS_mat_x_CDS_mat(A,B)结果(C)
类型(CDS(*,*,*,*)) ,INTENT(IN):: A,B
TYPE(CDS(A%lb1,A%ub1,A%ld + B%ld,A%ud + B%ud)):: C
C%lb2 = B%lb2
C%ub2 = B%ub2
!执行产品。
! :
END FUNCTION CDS_mat_x_CDS_mat
END MODULE xyz
从理论上讲,为编译器提供了更多优化机会,因为它在函数结果所需的存储函数调用之前具有更多的洞察力。这实际上是否会带来更好的真实世界性能取决于编译器的实现和函数引用的性质。
Foreword
In order to store banded matrices whose full counterparts can have both rows and columns indexed from indices other than 1
, I defined a derived data type as
TYPE CDS
REAL, DIMENSION(:,:), ALLOCATABLE :: matrix
INTEGER, DIMENSION(2) :: lb, ub
INTEGER :: ld, ud
END TYPE CDS
where CDS stands for compressed diagonal storage.Given the declaration TYPE(CDS) :: A
,
- The rank-2 component
matrix
is supposed to contain, as columns, the diagonals of the actual full matrix (like here, except that I store the diagonals as columns and not as rows). - The components
ld
andud
are supposed to contain the number of lower and upper diagonals respectively, that is-lbound(A%matrix,2)
and+ubound(A%matrix,2)
. - The 2-elements components
lb
andub
are supposed to contain the lower bounds and upper bounds of the actual full matrix along the two dimensions. In particularlb(1)
andub(1)
should be the same aslbound(A%matrix,1)
andlbound(A%matrix,2)
.
As you can see in points 2. and 3., the derived type contains some redundant information, but I don't care, since they are just 3 couples of integers. Furthermore, in the code that I'm writing, the information about the bounds and the band of the actual full matrix is know before the matrix can be filled. So I first assign the values to the components ld
, ud
, lb
and ub
, and then I used these components to ALLOCATE
the matrix
component (then I can fill it properly).
The problem
I have to perform matrix multiplication between such sparse matrices, so I wrote a FUNCTION
to perform such product and used it to overload the *
operator.
At the moment the function is as follows,
FUNCTION CDS_mat_x_CDS_mat(A, B)
IMPLICIT NONE
TYPE(CDS), INTENT(IN) :: A, B
TYPE(CDS) :: cds_mat_x_cds_mat
! determine the lower and upper bounds and band of the result based on those of the operands
CDS_mat_x_CDS_mat%lb(1) = A%lb(1)
CDS_mat_x_CDS_mat%ub(1) = A%ub(1)
CDS_mat_x_CDS_mat%lb(2) = B%lb(2)
CDS_mat_x_CDS_mat%ub(2) = B%ub(2)
CDS_mat_x_CDS_mat%ld = A%ld + B%ld
CDS_mat_x_CDS_mat%ud = A%ud + B%ud
! allocate the matrix component
ALLOCATE(CDS_mat_x_CDS_mat%matrix(CDS_mat_x_CDS_mat%lb(1):CDS_mat_x_CDS_mat%ub(1),&
& -CDS_mat_x_CDS_mat%ld:+CDS_mat_x_CDS_mat%ud))
! perform the product
:
:
END FUNCTION
This means that, if I have to do the product multiple times the allocation is done multiple times inside the function. I think this is not good from a performance point of view.
I ask for suggestions on how to accomplish the task of banded sparse matrix times banded sparse matrix. I would like to use the type the I defined because I need it to be as general, in terms of bounds, as it is at the moment. But I could change the procedure to perform the product (from FUNCTION
to SUBROUTINE
, if necessary).
Ideas
I could rewrite the procedure as a SUBROUTINE
in order to declare CDS_mat_x_CDS_mat
with INTENT(INOUT)
do the assignment of components other than matrix
, as well as the allocation, outside the SUBROUTINE
. The drawback would be that I could not overload the *
operator.
I noticed the intrinsic function MATMUL
can operate on any rank-2 operands, whichever the upper and lower bounds along the two dimensions. This means that the allocation is performed inside the function. I suppose it's efficient (since it is an intrinsic). The difference with respect to my function is that it accepts rank-2 arrays of any shape, wheres mine accept derived data types objects with a rank-2 array component of any shape.
The intrinsic function MATMUL has the equivalent of an automatic (F2008 5.2.2) result - the shape of the result is expressed in such a way that it becomes a characteristic of the function (F2008 12.3.3) - the shape of the function result is determined in the specification part of the function and (in terms of implementation) the compiler therefore knows how to calculate the shape of the function result ahead of actually executing the function proper.
As a consequence there are no Fortran language ALLOCATABLE variables associated with the equivalent of the intrinsic MATMUL function result. This is not the same thing as there being "no memory allocation" - the compiler may still need to allocate memory behind the scenes for its own purposes - for things like expressions temporaries and the like.
(I say "equivalent of" above, because intrinsic procedures are inherently special - but pretend for a moment that MATMUL was just a user function.)
The equivalent of that sort of automatic result for your case can be achieved by using length type parameters. This is Fortran 2003 feature - the same base language standard that introduced allocatable components - but it is not one that has yet been implemented by all actively maintained compilers.
MODULE xyz
IMPLICIT NONE
TYPE CDS(lb1, ub1, ld, ud)
INTEGER, LEN :: lb1, ub1, ld, ud
REAL :: matrix(lb1:ub1, ld:ud)
INTEGER :: lb2, ub2
END TYPE CDS
INTERFACE OPERATOR(*)
MODULE PROCEDURE CDS_mat_x_CDS_mat
END INTERFACE OPERATOR(*)
CONTAINS
FUNCTION CDS_mat_x_CDS_mat(A, B) RESULT(C)
TYPE(CDS(*,*,*,*)), INTENT(IN) :: A, B
TYPE(CDS(A%lb1, A%ub1, A%ld+B%ld, A%ud+B%ud)) :: C
C%lb2 = B%lb2
C%ub2 = B%ub2
! perform the product.
! :
END FUNCTION CDS_mat_x_CDS_mat
END MODULE xyz
Theoretically this gives the compiler more opportunity for optimisation, because it has more insight ahead of the call to the function for the storage required for the function result. Whether this actually results in better real world performance depends on the implementation of the compiler and the nature of the function references.
这篇关于Fortran函数用于在派生类型与可分配组件之间过载乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!