我正在研究的算法需要在几个地方计算一种矩阵三乘积。

该操作采用三个具有相同尺寸的正方形矩阵,并生成一个3索引张量。标记操作数ABC,结果的第(i,j,k)个元素是

X[i,j,k] = \sum_a A[i,a] B[a,j] C[k,a]

在numpy中,您可以使用einsum('ia,aj,ka->ijk', A, B, C)进行计算。

问题:
  • 此操作是否有标准名称?
  • 我可以通过一个BLAS调用来计算吗?
  • 还有其他可以进行这种类型的表达式计算的,经过大量优化的数字C/Fortran库吗?
  • 最佳答案

    简介和解决方案代码
    np.einsum 确实很难被击败,,但是在极少数情况下,如果可以将matrix-multiplication引入计算中,您仍然可以击败它。经过几次试验,看来您可以引入 matrix-multiplication with np.dot 来超越np.einsum('ia,aj,ka->ijk', A, B, C)的性能。
    基本思想是将“所有einsum”操作分解为np.einsumnp.dot的组合,如下所示:

  • A:[i,a]完成B:[a,j]np.einsum的求和,以得到3D array:[i,j,a]
  • 然后将这个3D数组重塑为2D array:[i*j,a],将第三个数组C[k,a]转换为[a,k],目的是在这两个数组之间执行matrix-multiplication,从而使[i*j,k]作为矩阵乘积,因为我们在那里丢失了索引[a]
  • 将产品重塑为3D array:[i,j,k]以得到最终输出。

  • 这是到目前为止讨论的第一个版本的实现-
    import numpy as np
    
    def tensor_prod_v1(A,B,C):   # First version of proposed method
        # Shape parameters
        m,d = A.shape
        n = B.shape[1]
        p = C.shape[0]
    
        # Calculate \sum_a A[i,a] B[a,j] to get a 3D array with indices as (i,j,a)
        AB = np.einsum('ia,aj->ija', A, B)
    
        # Calculate entire summation losing a-ith index & reshaping to desired shape
        return np.dot(AB.reshape(m*n,d),C.T).reshape(m,n,p)
    
    由于我们正在对所有三个输入数组的a-th索引求和,因此可以使用三种不同的方法沿第a个索引求和。前面列出的代码是(A,B)。因此,我们还可以使用(A,C)(B,C)为我们提供另外两个变体,如下所示:
    def tensor_prod_v2(A,B,C):
        # Shape parameters
        m,d = A.shape
        n = B.shape[1]
        p = C.shape[0]
    
        # Calculate \sum_a A[i,a] C[k,a] to get a 3D array with indices as (i,k,a)
        AC = np.einsum('ia,ja->ija', A, C)
    
        # Calculate entire summation losing a-ith index & reshaping to desired shape
        return np.dot(AC.reshape(m*p,d),B).reshape(m,p,n).transpose(0,2,1)
    
    def tensor_prod_v3(A,B,C):
        # Shape parameters
        m,d = A.shape
        n = B.shape[1]
        p = C.shape[0]
    
        # Calculate \sum_a B[a,j] C[k,a] to get a 3D array with indices as (a,j,k)
        BC = np.einsum('ai,ja->aij', B, C)
    
        # Calculate entire summation losing a-ith index & reshaping to desired shape
        return np.dot(A,BC.reshape(d,n*p)).reshape(m,n,p)
    
    根据输入数组的形状,不同的方法彼此之间会产生不同的加速,但是我们希望所有方法都比all-einsum方法更好。性能编号在下一部分中列出。
    运行时测试
    这可能是最重要的部分,因为我们尝试使用建议的方法的三种变体来研究加速倍数。
    问题中最初提出的all-einsum方法。
    数据集#1(相等形状的数组):
    In [494]: L1 = 200
         ...: L2 = 200
         ...: L3 = 200
         ...: al = 200
         ...:
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...:
    
    In [495]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...:
    1 loops, best of 3: 470 ms per loop
    1 loops, best of 3: 391 ms per loop
    1 loops, best of 3: 446 ms per loop
    1 loops, best of 3: 3.59 s per loop
    
    数据集2(更大的A):
    In [497]: L1 = 1000
         ...: L2 = 100
         ...: L3 = 100
         ...: al = 100
         ...:
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...:
    
    In [498]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...:
    1 loops, best of 3: 442 ms per loop
    1 loops, best of 3: 355 ms per loop
    1 loops, best of 3: 303 ms per loop
    1 loops, best of 3: 2.42 s per loop
    
    数据集3(更大的B):
    In [500]: L1 = 100
         ...: L2 = 1000
         ...: L3 = 100
         ...: al = 100
         ...:
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...:
    
    In [501]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...:
    1 loops, best of 3: 474 ms per loop
    1 loops, best of 3: 247 ms per loop
    1 loops, best of 3: 439 ms per loop
    1 loops, best of 3: 2.26 s per loop
    
    数据集4(C较大):
    In [503]: L1 = 100
         ...: L2 = 100
         ...: L3 = 1000
         ...: al = 100
         ...:
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
    
    In [504]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...:
    1 loops, best of 3: 250 ms per loop
    1 loops, best of 3: 358 ms per loop
    1 loops, best of 3: 362 ms per loop
    1 loops, best of 3: 2.46 s per loop
    
    数据集#5(较大的ath尺寸长度):
    In [506]: L1 = 100
         ...: L2 = 100
         ...: L3 = 100
         ...: al = 1000
         ...:
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...:
    
    In [507]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...:
    1 loops, best of 3: 373 ms per loop
    1 loops, best of 3: 269 ms per loop
    1 loops, best of 3: 299 ms per loop
    1 loops, best of 3: 2.38 s per loop
    
    结论:我们看到了 8x-10x 的加速发展,与问题中列出的all-einsum方法相比,提议的方法有所不同。

    关于matlab - 矩阵/张量三乘积?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30206293/

    10-16 17:29