问题描述
我有一个总和,我想加快速度.在一种情况下是:
I have a sum of sums that I want to speed up. In one case it is:
S_ {x,y,k,l} Fu_ {ku} Fv_ {lv} Fx_ {kx} Fy_ {ly}
S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}
在另一种情况下是:
S_ {x,y}(S_ {k,l} Fu_ {ku} Fv_ {lv} Fx_ {kx} Fy_ {ly})^ 2
S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} )^2
注意:S_ {indices}:是这些索引上的总和
第一种情况,我想出了如何使用numpy的einsum
进行操作,并导致了惊人的加速〜x160.
The first case I have figured out how to do using numpy's einsum
and it results in an amazing speedup ~ x160.
此外,我已经考虑过尝试扩大正方形,但这不是杀手,因为我需要对x,y,k,l,k,l求和而不是x,y,k,l求和?
Also, I have thought of trying to expand the square but won't that be a killer as I would need to sum over x,y,k,l,k,l instead of x,y,k,l?
这是一个演示实现与einsum
的区别和解决方案的实现.
Here is an implementation that demonstrates the difference and the solution I have with einsum
.
Nx = 3
Ny = 4
Nk = 5
Nl = 6
Nu = 7
Nv = 8
Fx = np.random.rand(Nx, Nk)
Fy = np.random.rand(Ny, Nl)
Fu = np.random.rand(Nu, Nk)
Fv = np.random.rand(Nv, Nl)
P = np.random.rand(Nx, Ny)
B = np.random.rand(Nk, Nl)
I1 = np.zeros([Nu, Nv])
I2 = np.zeros([Nu, Nv])
t = time.time()
for iu in range(Nu):
for iv in range(Nv):
for ix in range(Nx):
for iy in range(Ny):
S = 0.
for ik in range(Nk):
for il in range(Nl):
S += Fu[iu,ik]*Fv[iv,il]*Fx[ix,ik]*Fy[iy,il]*P[ix,iy]*B[ik,il]
I1[iu, iv] += S
I2[iu, iv] += S**2.
print time.time() - t; t = time.time()
# 0.0787379741669
I1_ = np.einsum('uk, vl, xk, yl, xy, kl->uv', Fu, Fv, Fx, Fy, P, B)
print time.time() - t
# 0.00049090385437
print np.allclose(I1_, I1)
# True
# Solution by expanding the square (not ideal)
t = time.time()
I2_ = np.einsum('uk,vl,xk,yl,um,vn,xm,yn,kl,mn,xy->uv', Fu,Fv,Fx,Fy,Fu,Fv,Fx,Fy,B,B,P**2)
print time.time() - t
# 0.0226809978485 <- faster than for loop but still much slower than I1_ einsum
print np.allclose(I2_, I2)
# True
如图所示,我设法完成了I1_的工作,我想出了如何对I1
使用einsum
进行上述操作.
As shown I've managed to do I1_ with I've figured out how to do the above with einsum
for I1
.
我添加了如何通过扩大平方来执行I2_
的操作,但是速度有些令人失望,并且可以预期...与〜x160相比,〜x3.47的加速效果
I added how to do I2_
by expanding the square but the speed up is a bit disappointing and to be expected... ~x3.47 speedup compared to ~x160
加速似乎不一致,我在x40和x1.2之前就已经获得了,但是现在却得到了不同的数字.无论哪种方式,差异和问题都将保留.
The speedups don't seem to be consistent, I had gotten before a x40 and an x1.2 but now get different numbers. Either way the difference and the question remain.
我试图简化我实际得到的总和,但搞砸了,上面的总和允许@ user5402提供出色的答案.
I tried to simplify the sum I'm actually after but messed up and the sum above allows for the excellent answer provided by @user5402.
我已经编辑了上面的代码以演示下面的总和:
I've edited the code above to demonstrate the sum which is below:
I1 = S_ {x,y,k,l} Fu_ {ku} Fv_ {lv} Fx_ {kx} Fy_ {ly} P_ {xy} B_ {kl}
I1 = S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl}
I2 = S_ {x,y}(S_ {k,l} Fu_ {ku} Fv_ {lv} Fx_ {kx} Fy_ {ly} P_ {xy} B_ {kl})^ 2
I2 = S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl} )^2
推荐答案
由于问题已更改,我将开始一个新的答案.
I'll start a new answer since the problem has changed.
尝试一下:
E = np.einsum('uk, vl, xk, yl, xy, kl->uvxy', Fu, Fv, Fx, Fy, P, B)
E1 = np.einsum('uvxy->uv', E)
E2 = np.einsum('uvxy->uv', np.square(E))
我发现它的运行速度与I1_一样快.
I've found it runs just as fast as the time for I1_.
这是我的测试代码: http://pastebin.com/ufwy7cLy
这篇关于如何做总和平方的总和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!