问题描述
我正在尝试仅使用张量点,点和重塑等方法来实现与np.matmul并行矩阵乘法相同的行为.
I'm trying to achieve the same behaviour as np.matmul parallel matrix multiplication using just tensordot,dot and reshaping etc.
我将其转换为使用的库没有支持并行乘法的matmul,仅支持点和张量点.
The library I am translating this to using does not have a matmul that supports parallel multiplication, only dot and tensordot.
另外,我想避免在第一维上进行迭代,并希望使用一组矩阵乘法和重塑来进行此操作(希望使用BLAS/GPU来运行其中的尽可能多的操作,因为我有大量的小矩阵需要计算)平行).
Additionally I want to avoid iterating over the first dimension, and want to do this using a set of matrix multiplications and reshaping (want as much of it to run using BLAS/GPU as i have large numbers of small matrices to calculate in parallel).
这里是一个例子:
import numpy as np
angles = np.array([np.pi/4, 2*np.pi/4, 2*np.pi/4])
vectors = np.array([ [1,0],[1,-1],[-1,0]])
s = np.sin(angles)
c = np.cos(angles)
rotations = np.array([[c,s],[-s,c]]).T
print rotations
print vectors
print("Correct: %s" % np.matmul(rotations, vectors.reshape(3,2,1)))
# I want to do this using tensordot/reshaping, i.e just gemm BLAS operations underneath
print("Wrong: %s" % np.tensordot(rotations, vectors, axes=(1,1)))
此输出为:
Correct: [[[ 7.07106781e-01]
[ 7.07106781e-01]]
[[ 1.00000000e+00]
[ 1.00000000e+00]]
[[ -6.12323400e-17]
[ -1.00000000e+00]]]
Wrong: [[[ 7.07106781e-01 1.11022302e-16 -7.07106781e-01]
[ -7.07106781e-01 -1.41421356e+00 7.07106781e-01]]
[[ 6.12323400e-17 -1.00000000e+00 -6.12323400e-17]
[ -1.00000000e+00 -1.00000000e+00 1.00000000e+00]]
[[ 6.12323400e-17 -1.00000000e+00 -6.12323400e-17]
[ -1.00000000e+00 -1.00000000e+00 1.00000000e+00]]]
有没有一种方法可以修改第二个表达式,从而获得与第一个表达式相同的结果,只需使用点/张量点即可.
Is there a way in which I can modify the second expression in order to get the same result as the first, just using dot/tensordot.
我相信这是有可能的,并且已经在网上看到了一些评论,但从来没有任何示例
I believe it is possible, and have seen some comments online, but never any examples
推荐答案
我们需要保持一个对齐,并在输出中也保持对齐.因此,tensordot/dot
在这里不起作用. More info on tensordot
可能以某种方式解释了为什么不会.但是,我们可以使用 np.einsum
,根据我的经验,在大多数情况下,它被认为比np.matmul
快一点.
We need to keep one aligned and keep that also at the output. So, tensordot/dot
won't work here. More info on tensordot
might explain it somehow on why it won't. But, we can use np.einsum
, which in most cases (in my experience) is seen to be marginally faster than np.matmul
.
实现看起来像这样-
np.einsum('ijk,ik->ij',rotations, vectors)
此外,似乎所需的输出具有一个尾随的单例调光.因此,用None/np.newaxis
在那里添加一个新轴,就像这样-
Also, it seems the desired output has one trailing singleton dim. So, append a new axis there with None/np.newaxis
, like so -
np.einsum('ijk,ik->ij',rotations, vectors)[...,None]
这篇关于使用张量点实现批矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!