问题描述
如果我有张量收缩 A [a,b] * B [b,c,d] = C [a,c,d]
其属性 B [b,c,d] = B [b,d,c]
和 C [a,c,d] = C [a,d,c]
,如何设置BLAS来利用这种对称性?
If I have a tensor contractionA[a,b] * B[b,c,d] = C[a,c,d]
which has the properties B[b,c,d] = B[b,d,c]
and C[a,c,d] = C[a,d,c]
, how to set up BLAS to utilize this symmetry?
这里假设采用爱因斯坦求和符号,即重复索引表示求和.
Here the Einstein summation notation is assumed, i.e., repeated indices mean summation.
sgemm
http://www.netlib.Explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260 似乎是矩阵的对称性,而不是秩3的张量.
sgemm
http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260seems about the symmetry of a matrix, than rank-3 tensor.
我可以尝试将张量 B
展平/整形为较低维度的数组,但至少在Fortran中,展平/整形张量似乎也需要时间.如何在Fortran中通过BLAS加快高阶张量收缩的重塑?
I could try to flat/reshape tensor B
into a lower dimension array, but seems flat/reshape tensor also takes time, at least in Fortran.How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?
推荐答案
矩阵运算C_ {acd} = A_ {ab}.B_ {bcd}可以以编程方式编写为 matrix * vector
操作的双循环(为清晰起见,使用matmul;根据需要替换为BLAS):
The matrix operation C_{acd} = A_{ab} . B_{bcd} can be written programmatically as a double loop of matrix * vector
operations (using matmul for clarity; replace with BLAS as desired):
n = size(B,3) ! = size(B,2)
do d=1,n
do c=1,n
C(:,c,d) = matmul(A(:,:), B(:,c,d))
enddo
enddo
由于" C [a,d,c] = C [a,c,d]
",因此 matmul
的平方环可以替换为 matmul
的三角形环和刚刚复制的三角形环,如:
Since "C[a,d,c]=C[a,c,d]
", the square loop of matmul
can be replaced with a triangular loop of matmul
and a triangular loop of just copying, as:
n = size(B,3) ! = size(B,2)
do d=1,n
do c=1,d
C(:,c,d) = matmul(A(:,:), B(:,c,d))
enddo
do c=d+1,n
C(:,c,d) = C(:,d,c)
enddo
enddo
这利用对称性来减少BLAS运算的次数,提高了性能,但是必须进行大量的 matrix * vector
乘法,而不是进行一次大的 matrix * matrix
乘法性能恶化.这种方法会整体上提高还是降低性能?找出答案的最佳方法可能是尝试一下然后看看.
This exploits symmetry to reduce the number of BLAS operations, improving performance, but having to do lots of matrix * vector
multiplications rather than one big matrix * matrix
multiplication will worsen performance. Will this approach overall improve or reduce performance? The best way to find that out is probably to try it and see.
这篇关于Fortran中高阶张量中具有对称性的BLAS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!