本文介绍了Fortran中高阶张量中具有对称性的BLAS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有张量收缩 A [a,b] * B [b,c,d] = C [a,c,d] 其属性 B [b,c,d] = B [b,d,c] C [a,c,d] = C [a,d,c] ,如何设置BLAS来利用这种对称性?

If I have a tensor contractionA[a,b] * B[b,c,d] = C[a,c,d]which has the properties B[b,c,d] = B[b,d,c] and C[a,c,d] = C[a,d,c], how to set up BLAS to utilize this symmetry?

这里假设采用爱因斯坦求和符号,即重复索引表示求和.

Here the Einstein summation notation is assumed, i.e., repeated indices mean summation.

sgemm http://www.netlib.Explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260 似乎是矩阵的对称性,而不是秩3的张量.

sgemmhttp://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260seems about the symmetry of a matrix, than rank-3 tensor.

我可以尝试将张量 B 展平/整形为较低维度的数组,但至少在Fortran中,展平/整形张量似乎也需要时间.如何在Fortran中通过BLAS加快高阶张量收缩的重塑?

I could try to flat/reshape tensor B into a lower dimension array, but seems flat/reshape tensor also takes time, at least in Fortran.How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?

推荐答案

矩阵运算C_ {acd} = A_ {ab}.B_ {bcd}可以以编程方式编写为 matrix * vector 操作的双循环(为清晰起见,使用matmul;根据需要替换为BLAS):

The matrix operation C_{acd} = A_{ab} . B_{bcd} can be written programmatically as a double loop of matrix * vector operations (using matmul for clarity; replace with BLAS as desired):

n = size(B,3) ! = size(B,2)
do d=1,n
  do c=1,n
    C(:,c,d) = matmul(A(:,:), B(:,c,d))
  enddo
enddo

由于" C [a,d,c] = C [a,c,d] ",因此 matmul 的平方环可以替换为 matmul 的三角形环和刚刚复制的三角形环,如:

Since "C[a,d,c]=C[a,c,d]", the square loop of matmul can be replaced with a triangular loop of matmul and a triangular loop of just copying, as:

n = size(B,3) ! = size(B,2)
do d=1,n
  do c=1,d
    C(:,c,d) = matmul(A(:,:), B(:,c,d))
  enddo

  do c=d+1,n
    C(:,c,d) = C(:,d,c)
  enddo
enddo

这利用对称性来减少BLAS运算的次数,提高了性能,但是必须进行大量的 matrix * vector 乘法,而不是进行一次大的 matrix * matrix 乘法性能恶化.这种方法会整体上提高还是降低性能?找出答案的最佳方法可能是尝试一下然后看看.

This exploits symmetry to reduce the number of BLAS operations, improving performance, but having to do lots of matrix * vector multiplications rather than one big matrix * matrix multiplication will worsen performance. Will this approach overall improve or reduce performance? The best way to find that out is probably to try it and see.

这篇关于Fortran中高阶张量中具有对称性的BLAS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 05:47