问题描述
我正在尝试使用 scipy
的 ndimage.convolve
函数对3维图像(RGB,宽度,高度)进行卷积.
在这里看看:
很明显,对于任何输入,每个内核/过滤器都只能输出深度为1的 NxN .
这是 scipy
的问题,因为当您输入大小为(3,5,5)
的 ndimage.convolve
时和大小为(3,3,3)
的过滤器/内核,此操作的结果产生的输出大小为(3,5,5)
,显然不求和不同的渠道.
是否有一种无需手动执行就可以强制求和的方法?我尝试在基本python中做尽可能少的事情,因为许多外部库都是用c ++编写的,并且可以更快地执行相同的操作.还是有其他选择?
没有scipy不会跳过通道的总和.之所以得到(3,5,5)
输出,是因为 ndimage.convolve
沿所有轴填充输入数组,然后在相同"位置执行卷积模式(即,输出具有与输入相同的形状,并以完全"模式相关性的输出为中心).请参阅 scipy.signal.convolve 有关模式的更多细节.
对于形状为(3,5,5)
的输入和过滤形状为(3、3、3)
的 w0
,输入会被填充,生成一个(7,9,9)
数组.参见下文(为简单起见,我使用带有0的常量填充):
a = np.array([[[2,0,2,2,2],[1、1、0、2、0],[0,0,1,2,2],[2,2,2,0,0],[1、0、1、2、0]],[[1、2、1、0、1],[0,2,0,0,1],[0,0,2,2,1],[2,0,1,0,2],[0,1,2,2,2]],[[0,0,2,2,2],[0,1,2,1,0],[0,0,0,2,0],[0,2,0,0,2],[0,0,2,2,1]]])w0 = np.array([[[0,1,-1],[1,-1,0],[0,0,0]],[[1,0,0],[0,-1,1],[1,0,1]],[[1,-1,0],[-1,0,-1],[-1,0,1]]])k = w0.shape [0]a_p = np.pad(a,k-1)数组([[[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,2,0,2,2,2,2,0,0],[0,0,1,1,0,2,0,0,0],[0,0,0,0,1,2,2,0,0],[0,0,2,2,2,0,0,0,0],[0,0,1,0,1,2,2,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,1,2,1,1,0,1,0,0],[0,0,0,2,0,0,1,0,0],[0,0,0,0,2,2,1,0,0],[0,0,2,0,1,0,2,0,0],[0,0,0,1,2,2,2,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,2,2,2,0,0],[0,0,0,1,2,2,1,0,0,0],[0,0,0,0,0,2,0,0,0],[0,0,0,2,0,0,2,0,0],[0,0,0,0,2,2,1,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]]])
在继续之前,请注意,在来自cs231n的图像中,执行的是相关而不是卷积,因此我们需要翻转 w0
或使用相关函数(我将做前者)./p>
然后,通过沿第一维(轴0)滑动来执行卷积,即(翻转的) w0
与 a_p [0:3]
卷积,然后使用 a_p [1:4]
,然后使用 a_p [2:5]
,然后使用 a_p [3:6]
,最后与 a_p [4:7]
一起使用,由于通道上的求和,每个都生成了(1、7、7)
数组.然后将它们堆叠在一起,形成(5,7,7)
数组.为了说明这一点,我使用了 scipy.signal.convolve
,它允许使用 full
模式:
out = scipy.signal.convolve(a,np.flip(w0),mode ='full')array([[[2,0,0,2,0,-2,-2],[-1,1,-5,-1,-4,-4,-2],[-1,-3,2,-3,1,-4,0],[2,1,-1,-3,-7,0,-2],[-1,-2,-4,-1,-4,-2,2],[-1,-2,-2,-2,1,-2,0],[0,-1,1,-1,-1,2,0]],[[3,2,4,0,4,2,1],[2,-1,1,-1,-1,0,-2],[1,-3,3,5,2,1,3][4,2,1,4,0,-3,-2],[1,1,1,-1,-1,3,-1],[1,-4,3,-1,-3,-4,0],[0,0,0,-1,1,2,2]],[[1,2,4,4,2,-2,-1],[1,2,1,-3,-4,-4,1],[-2,2,-3,3,1,2,4],[1,2,5,-6,6,-2,3],[2,-5,4,1,5,5,4,0],[-2,0,0,1,-3,-4,3],[-1,1,-1,-2,4,3,3]],[[0,0,2,2,2,4,2,2],[0,0,3,3,3,-2,1],[-1,0,0,4,4,0,4,3],[0,0,2,3,1,3,3],[0,0,0,1,7,1,3],[-2,2,0,2,-3,1,4],[0,-1,-1,0,2,4,1]],[[0,0,0,0,0,0,0],[0,0,0,-2,0,0,2],[0,0,-3,-1,1,3,0],[0,-1,-1,1,-1,2,0],[0,0,-2,0,2,-2,2],[0,-2,2,-2,-2,3,1],[0,0,-2,0,1,1,0]]])
要进入 ndimage.convolve
的相同"模式,我们需要将 out
居中:
out = out [1:-1,1:-1,1:-1]数组([[[-1,1,-1,-1,0],[-3、3、5、2、1],[2,1,4,0,-3],[1,1,-1,-1,3],[-4、3,-1,-3,-4],[[2,1,-3,-4,-4],[2,-3,3,1,2],[2,5,-6,6,-2],[-5、4、1、5、4],[0,0,1,-3,-4]],[[0,3,3,3,-2],[0,0,4,0,4],[0,2,3,1,3],[0,0,1,7,1],[2,0,2,-3,1]]])
如果您运行 scipy.ndimage.convolve(a,np.flip(w0),mode ='constant',cval = 0)
,这正是您所得到的.最后,要获得所需的输出,我们需要忽略依赖于沿第一维填充的元素(即,仅保留中间部分),还应使用跨距 s = 2
(即 out [1] [:: s,:: s]
),最后加上偏差 b = 1
:
out [1] [:: s,:: s] + b数组([[3,-2,-3],[3,-5,-1],[1,2,-3]])
将所有内容都放在一行中:
scipy.ndimage.convolve(a,np.flip(w0),mode ='constant',cval = 0)[1] [:: 2,:: 2] + b#或使用scipy.signal.convolve#scipy.signal.convolve(a,np.flip(w0),'full')[2] [1:-1,1:-1] [:: 2,:: 2] + b# 或者#scipy.signal.convolve(a,np.flip(w0),'same')[1] [:: 2,:: 2] + b
I'm trying to use scipy
's ndimage.convolve
function to perform a convolution on a 3 dimensional image (RGB, width, height).
Taking a look here:
It is clear to see that for any input, each kernel/filter should only ever have an output of NxN, with strictly a depth of 1.
This is a problem with scipy
, as when you do ndimage.convolve
with an input of size (3, 5, 5)
and a filter/kernel of size (3, 3, 3)
, the result of this operation produces an output size of (3, 5, 5)
, clearly not summing the different channels.
Is there a way to force this summation without manually doing so? I try to do as little in base python as possible, as a lot of external libraries are written in c++ and do the same operations faster. Or is there an alternative?
No scipy doesn't skip the summation of channels. The reason why you get a (3, 5, 5)
output is because ndimage.convolve
is padding the input array along all the axes and then performs convolution in the "same" mode (i.e. the output has the same shape as input, centered with respect to the output of the "full" mode correlation). See the scipy.signal.convolve for more detail on modes.
For your input of shape (3 ,5, 5)
and filter w0
of shape (3, 3, 3)
, the input is padded resulting in a (7, 9, 9)
array. See below (for simplicity I use constant padding with 0's):
a = np.array([[[2, 0, 2, 2, 2],
[1, 1, 0, 2, 0],
[0, 0, 1, 2, 2],
[2, 2, 2, 0, 0],
[1, 0, 1, 2, 0]],
[[1, 2, 1, 0, 1],
[0, 2, 0, 0, 1],
[0, 0, 2, 2, 1],
[2, 0, 1, 0, 2],
[0, 1, 2, 2, 2]],
[[0, 0, 2, 2, 2],
[0, 1, 2, 1, 0],
[0, 0, 0, 2, 0],
[0, 2, 0, 0, 2],
[0, 0, 2, 2, 1]]])
w0 = np.array([[[0, 1, -1],
[1, -1, 0],
[0, 0, 0]],
[[1, 0, 0],
[0, -1, 1],
[1, 0, 1]],
[[ 1, -1, 0],
[-1, 0, -1],
[-1, 0, 1]]])
k = w0.shape[0]
a_p = np.pad(a, k-1)
array([[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 0, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 0, 2, 0, 0, 0],
[0, 0, 0, 0, 1, 2, 2, 0, 0],
[0, 0, 2, 2, 2, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 2, 1, 0, 1, 0, 0],
[0, 0, 0, 2, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 2, 2, 1, 0, 0],
[0, 0, 2, 0, 1, 0, 2, 0, 0],
[0, 0, 0, 1, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 2, 2, 2, 0, 0],
[0, 0, 0, 1, 2, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 2, 0, 0, 0],
[0, 0, 0, 2, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 2, 2, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]])
Before proceeding, note that in the image from cs231n what is performed is correlation and not convolution, so we need to flip the w0
or instead use correlation function (I will do the former).
Then, the convolution is performed by sliding along the first dimension (axis-0), i.e. the (flipped) w0
is convolved with a_p[0:3]
, then with a_p[1:4]
, then with a_p[2:5]
, then with a_p[3:6]
and finally with a_p[4:7]
, each resulting in a (1, 7, 7)
array due to summation over the channels. Then they are stacked together resulting in (5, 7, 7)
array. To show this I use scipy.signal.convolve
which allows to use the full
mode:
out = scipy.signal.convolve(a, np.flip(w0), mode='full')
array([[[ 2, 0, 0, 2, 0, -2, -2],
[-1, 1, -5, -1, -4, -4, -2],
[-1, -3, 2, -3, 1, -4, 0],
[ 2, 1, -1, -3, -7, 0, -2],
[-1, -2, -4, -1, -4, -2, 2],
[-1, -2, -2, -2, 1, -2, 0],
[ 0, -1, 1, -1, -1, 2, 0]],
[[ 3, 2, 4, 0, 4, 2, 1],
[ 2, -1, 1, -1, -1, 0, -2],
[ 1, -3, 3, 5, 2, 1, 3],
[ 4, 2, 1, 4, 0, -3, -2],
[ 1, 1, 1, -1, -1, 3, -1],
[ 1, -4, 3, -1, -3, -4, 0],
[ 0, 0, 0, -1, 1, 2, 2]],
[[ 1, 2, 4, 4, 2, -2, -1],
[ 1, 2, 1, -3, -4, -4, 1],
[-2, 2, -3, 3, 1, 2, 4],
[ 1, 2, 5, -6, 6, -2, 3],
[ 2, -5, 4, 1, 5, 4, 0],
[-2, 0, 0, 1, -3, -4, 3],
[-1, 1, -1, -2, 4, 3, 3]],
[[ 0, 0, 2, 2, 4, 2, 2],
[ 0, 0, 3, 3, 3, -2, 1],
[-1, 0, 0, 4, 0, 4, 3],
[ 0, 0, 2, 3, 1, 3, 3],
[ 0, 0, 0, 1, 7, 1, 3],
[-2, 2, 0, 2, -3, 1, 4],
[ 0, -1, -1, 0, 2, 4, 1]],
[[ 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, -2, 0, 0, 2],
[ 0, 0, -3, -1, 1, 3, 0],
[ 0, -1, -1, 1, -1, 2, 0],
[ 0, 0, -2, 0, 2, -2, 2],
[ 0, -2, 2, -2, -2, 3, 1],
[ 0, 0, -2, 0, 1, 1, 0]]])
To get into the "same" mode of ndimage.convolve
we need to center the out
:
out = out[1:-1, 1:-1, 1:-1]
array([[[-1, 1, -1, -1, 0],
[-3, 3, 5, 2, 1],
[ 2, 1, 4, 0, -3],
[ 1, 1, -1, -1, 3],
[-4, 3, -1, -3, -4]],
[[ 2, 1, -3, -4, -4],
[ 2, -3, 3, 1, 2],
[ 2, 5, -6, 6, -2],
[-5, 4, 1, 5, 4],
[ 0, 0, 1, -3, -4]],
[[ 0, 3, 3, 3, -2],
[ 0, 0, 4, 0, 4],
[ 0, 2, 3, 1, 3],
[ 0, 0, 1, 7, 1],
[ 2, 0, 2, -3, 1]]])
This is exactly what you get if you run scipy.ndimage.convolve(a, np.flip(w0), mode='constant', cval=0)
. Finally, to get the desired output we need to ignore the elements that relied on padding along the first dimension (i.e. keep only the middle part of the out), also use strides s=2
(i.e. out[1][::s, ::s]
), and finally add the bias b = 1
:
out[1][::s, ::s] + b
array([[ 3, -2, -3],
[ 3, -5, -1],
[ 1, 2, -3]])
Putting everything in one line:
scipy.ndimage.convolve(a, np.flip(w0), mode='constant', cval=0)[1][::2, ::2] + b
# or using scipy.signal.convolve
# scipy.signal.convolve(a, np.flip(w0), 'full')[2][1:-1,1:-1][::2, ::2] + b
# or
# scipy.signal.convolve(a, np.flip(w0), 'same')[1][::2, ::2] + b
这篇关于Scipy ndimage.convolve跳过通道的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!