问题描述
我有一个张量U,它由n个维数为(d,k)的矩阵和一个V维数为(k,n)的矩阵组成.我想将它们相乘,以便结果返回尺寸为(d,n)的矩阵,其中第j列是U的矩阵j与V的列j之间的矩阵相乘的结果.
获得此结果的一种可能方法是:
for j in range(n):
res[:,j] = U[:,:,j] * V[:,j]
我想知道使用numpy
库是否有更快的方法.特别是,我在考虑 np.tensordot()
函数.
这个小片段允许我将单个矩阵乘以标量,但是对向量的明显概括并没有返回我想要的结果.
a = np.array(range(1, 17))
a.shape = (4,4)
b = np.array((1,2,3,4,5,6,7))
r1 = np.tensordot(b,a, axes=0)
有什么建议吗?
有两种方法可以做到这一点.首先想到的是 np.einsum
:
# some fake data
gen = np.random.RandomState(0)
ni, nj, nk = 10, 20, 100
U = gen.randn(ni, nj, nk)
V = gen.randn(nj, nk)
res1 = np.zeros((ni, nk))
for k in range(nk):
res1[:,k] = U[:,:,k].dot(V[:,k])
res2 = np.einsum('ijk,jk->ik', U, V)
print(np.allclose(res1, res2))
# True
np.einsum
使用爱因斯坦符号来表达张量收缩.在上面的表达式'ijk,jk->ik'
中,i
,j
和k
是与U
和V
的不同尺寸相对应的下标.每个逗号分隔的分组对应于传递给np.einsum
的操作数之一(在这种情况下,U
的尺寸为ijk
,而V
的尺寸为jk
). '->ik'
部分指定输出数组的尺寸.将对输出字符串中不存在的带有下标的所有尺寸求和.
np.einsum
对于执行复杂的张量收缩非常有用,但是可能需要一段时间才能完全了解其工作原理.您应该看一下文档中的示例(上面链接).
其他一些选择:
-
与广播,然后进行求和:
res3 = (U * V[None, ...]).sum(1)
-
inner1d
,具有移调的负载:from numpy.core.umath_tests import inner1d res4 = inner1d(U.transpose(0, 2, 1), V.T)
一些基准:
In [1]: ni, nj, nk = 100, 200, 1000
In [2]: %%timeit U = gen.randn(ni, nj, nk); V = gen.randn(nj, nk)
....: np.einsum('ijk,jk->ik', U, V)
....:
10 loops, best of 3: 23.4 ms per loop
In [3]: %%timeit U = gen.randn(ni, nj, nk); V = gen.randn(nj, nk)
(U * V[None, ...]).sum(1)
....:
10 loops, best of 3: 59.7 ms per loop
In [4]: %%timeit U = gen.randn(ni, nj, nk); V = gen.randn(nj, nk)
inner1d(U.transpose(0, 2, 1), V.T)
....:
10 loops, best of 3: 45.9 ms per loop
I have a tensor U composed of n matrices of dimension (d,k) and a matrix V of dimension (k,n).
I would like to multiply them so that the result returns a matrix of dimension (d,n) in which column j is the result of the matrix multiplication between the matrix j of U and the column j of V.
One possible way to obtain this is:
for j in range(n):
res[:,j] = U[:,:,j] * V[:,j]
I am wondering if there is a faster approach using numpy
library. In particular I'm thinking of the np.tensordot()
function.
This small snippet allows me to multiply a single matrix by a scalar, but the obvious generalization to a vector is not returning what I was hoping for.
a = np.array(range(1, 17))
a.shape = (4,4)
b = np.array((1,2,3,4,5,6,7))
r1 = np.tensordot(b,a, axes=0)
Any suggestion?
There are a couple of ways you could do this. The first thing that comes to mind is np.einsum
:
# some fake data
gen = np.random.RandomState(0)
ni, nj, nk = 10, 20, 100
U = gen.randn(ni, nj, nk)
V = gen.randn(nj, nk)
res1 = np.zeros((ni, nk))
for k in range(nk):
res1[:,k] = U[:,:,k].dot(V[:,k])
res2 = np.einsum('ijk,jk->ik', U, V)
print(np.allclose(res1, res2))
# True
np.einsum
uses Einstein notation to express tensor contractions. In the expression 'ijk,jk->ik'
above, i
,j
and k
are subscripts that correspond to the different dimensions of U
and V
. Each comma-separated grouping corresponds to one of the operands passed to np.einsum
(in this case U
has dimensions ijk
and V
has dimensions jk
). The '->ik'
part specifies the dimensions of the output array. Any dimensions with subscripts that aren't present in the output string are summed over.
np.einsum
is incredibly useful for performing complex tensor contractions, but it can take a while to fully wrap your head around how it works. You should take a look at the examples in the documentation (linked above).
Some other options:
Element-wise multiplication with broadcasting, followed by summation:
res3 = (U * V[None, ...]).sum(1)
inner1d
with a load of transposing:from numpy.core.umath_tests import inner1d res4 = inner1d(U.transpose(0, 2, 1), V.T)
Some benchmarks:
In [1]: ni, nj, nk = 100, 200, 1000
In [2]: %%timeit U = gen.randn(ni, nj, nk); V = gen.randn(nj, nk)
....: np.einsum('ijk,jk->ik', U, V)
....:
10 loops, best of 3: 23.4 ms per loop
In [3]: %%timeit U = gen.randn(ni, nj, nk); V = gen.randn(nj, nk)
(U * V[None, ...]).sum(1)
....:
10 loops, best of 3: 59.7 ms per loop
In [4]: %%timeit U = gen.randn(ni, nj, nk); V = gen.randn(nj, nk)
inner1d(U.transpose(0, 2, 1), V.T)
....:
10 loops, best of 3: 45.9 ms per loop
这篇关于使用numpy张量点进行张量乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!