问题描述
我正在对一些张量进行卷积.
I'm doing convolution of some tensors.
这是在MATLAB中进行的小型测试:
Here is small test in MATLAB:
ker= rand(3,4,2);
a= rand(5,7,2);
c=convn(a,ker,'valid');
c11=sum(sum(a(1:3,1:4,1).*ker(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker(:,:,2)));
c(1,1)-c11 % not equal!
第三行使用convn
执行N-D卷积运算,我想将第一行,convn
的第一列的结果与手动计算值进行比较.但是,与convn
相比,我的计算结果不相等.
The third line performs a N-D convolution with convn
, and I want to compare the result of the first row, first column of convn
with computing the value manually. However, my computation in comparison to convn
is not equal.
那么MATLAB convn
的背后是什么?我对张量卷积的理解是错误的吗?
So what is behind MATLAB's convn
? Is my understanding of tensor convolution is wrong?
推荐答案
您几乎具有正确性.您的理解有两点错误:
You almost have it correct. There are two things slightly wrong with your understanding:
-
您选择了
valid
作为卷积标志.这意味着从卷积返回的输出具有其大小,因此当您使用内核扫过矩阵时,它必须舒适地适合矩阵本身.因此,返回的第一个有效"输出实际上是用于矩阵位置(2,2,1)
处的计算.这意味着您可以在此位置舒适地安装内核,这对应于输出的位置(1,1)
.为了演示,使用上面的代码,这就是a
和ker
对我来说的样子:
You chose
valid
as the convolution flag. This means that the output returned from the convolution has its size so that when you are using the kernel to sweep over the matrix, it has to fit comfortably inside the matrix itself. Therefore, the first "valid" output that is returned is actually for the computation at location(2,2,1)
of your matrix. This means that you can fit your kernel comfortably at this location, and this corresponds to position(1,1)
of the output. To demonstrate, this is whata
andker
look like for me using your above code:
>> a
a(:,:,1) =
0.9930 0.2325 0.0059 0.2932 0.1270 0.8717 0.3560
0.2365 0.3006 0.3657 0.6321 0.7772 0.7102 0.9298
0.3743 0.6344 0.5339 0.0262 0.0459 0.9585 0.1488
0.2140 0.2812 0.1620 0.8876 0.7110 0.4298 0.9400
0.1054 0.3623 0.5974 0.0161 0.9710 0.8729 0.8327
a(:,:,2) =
0.8461 0.0077 0.5400 0.2982 0.9483 0.9275 0.8572
0.1239 0.0848 0.5681 0.4186 0.5560 0.1984 0.0266
0.5965 0.2255 0.2255 0.4531 0.5006 0.0521 0.9201
0.0164 0.8751 0.5721 0.9324 0.0035 0.4068 0.6809
0.7212 0.3636 0.6610 0.5875 0.4809 0.3724 0.9042
>> ker
ker(:,:,1) =
0.5395 0.4849 0.0970 0.3418
0.6263 0.9883 0.4619 0.7989
0.0055 0.3752 0.9630 0.7988
ker(:,:,2) =
0.2082 0.4105 0.6508 0.2669
0.4434 0.1910 0.8655 0.5021
0.7156 0.9675 0.0252 0.0674
如您所见,在矩阵a
中的位置(2,2,1)
,ker
可以舒适地放入矩阵内部,如果从卷积中回想起,它只是两个元素之间的逐元素乘积之和.内核和位置(2,2,1)
处的矩阵子集,其大小与内核相同(实际上,您需要对内核做其他事情,我将在接下来的时间中进行保留-参见下文).因此,您要计算的系数实际上是(2,2,1)
而不是(1,1,1)
的输出.从主旨上来说,您已经知道了,但是我想把它放在那儿,以防您不知道.
As you can see, at position (2,2,1)
in the matrix a
, ker
can fit comfortably inside the matrix and if you recall from convolution, it is simply a sum of element-by-element products between the kernel and the subset of the matrix at position (2,2,1)
that is the same size as your kernel (actually, you need to do something else to the kernel which I will reserve for my next point - see below). Therefore, the coefficient that you are calculating is actually the output at (2,2,1)
, not at (1,1,1)
. From the gist of it though, you already know this, but I wanted to put that out there in case you didn't know.
您忘记了进行N-D卷积时,需要在每个维度上翻转蒙版.如果您还记得一维卷积,则必须将遮罩水平翻转.我的意思是,您只是将元素以相反的顺序放置.例如,[1 2 3 4]
的数组将成为[4 3 2 1]
.在2D卷积中,必须同时水平和垂直翻转.因此,您将采用矩阵的每一行,并将每一行以相反的顺序放置,就像一维情况一样.在这里,您将每一行都视为一维信号并进行翻转.完成此操作后,您将获得此翻转的结果,并将每个列视为一维信号,然后再次进行翻转.
You are forgetting that for N-D convolution, you need to flip the mask in each dimension. If you remember from 1D convolution, the mask must be flipped in the horizontally. What I mean by flipped is that you simply place the elements in reverse order. An array of [1 2 3 4]
for example would become [4 3 2 1]
. In 2D convolution, you must flip both horizontally and vertically. Therefore, you would take each row of your matrix and place each row in reverse order, much like the 1D case. Here, you would treat each row as a 1D signal and do the flipping. Once you accomplish this, you would take this flipped result, and treat each column as a 1D signal and do the flipping again.
现在,就3D而言,您必须水平,垂直和暂时翻转.这意味着您将需要对矩阵的每个切片分别执行2D翻转,然后以3D方式抓取单个列并将其视为1D信号.在MATLAB语法中,您将得到ker(1,1,:)
,将其视为一维信号,然后翻转.您将对ker(1,2,:)
,ker(1,3,:)
等重复此操作,直到完成第一个切片.请记住,我们不会去第二个切片或其他任何切片,而是重复我们刚刚做的事情.因为您要获取矩阵的3D截面,所以您固有地对提取的每个3D列的所有切片进行操作.因此,仅查看矩阵的第一片,因此您需要在计算卷积之前对内核执行此操作:
Now, in your case for 3D, you must flip horizontally, vertically and temporally. This means that you would need to perform the 2D flipping for each slice of your matrix independently, you would then grab single columns in a 3D fashion and treat those as 1D signals. In MATLAB syntax, you would get ker(1,1,:)
, treat this as a 1D signal, then flip. You would repeat this for ker(1,2,:)
, ker(1,3,:)
etc. until you are finished with the first slice. Bear in mind that we don't go to the second slice or any of the other slices and repeat what we just did. Because you are taking a 3D section of your matrix, you are inherently operating over all of the slices for each 3D column you extract. Therefore, only look at the first slice of your matrix, and so you need to do this to your kernel before computing the convolution:
ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
flipdim
在指定的轴上执行翻转.在我们的情况下,我们是垂直执行此操作,然后将结果水平执行,然后再暂时执行.然后,您将在汇总中使用ker_flipped
.请注意,执行翻转顺序无关紧要. flipdim
在每个维度上独立运行,因此只要您记得翻转所有维度,输出将是相同的.
flipdim
performs the flipping on a specified axis. In our case, we are doing it vertically, then taking the result and doing it horizontally, and then again doing it temporally. You would then use ker_flipped
in your summation instead. Take note that it doesn't matter which order you do the flipping. flipdim
operates on each dimension independently, and so as long as you remember to flip all dimensions, the output will be the same.
为了演示,这是convn
的输出结果:
To demonstrate, here's what the output looks like with convn
:
c =
4.1837 4.1843 5.1187 6.1535
4.5262 5.3253 5.5181 5.8375
5.1311 4.7648 5.3608 7.1241
现在,要确定什么是c(1,1)
,您需要在翻转内核上进行计算:
Now, to determine what c(1,1)
is by hand, you would need to do your calculation on the flipped kernel:
ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
c11 = sum(sum(a(1:3,1:4,1).*ker_flipped(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker_flipped(:,:,2)));
我们得到的输出是:
c11 =
4.1837
如您所见,这将验证我们使用convn
在MATLAB中进行的计算所得到的结果.如果要比较更多的精度数字,请使用format long
并将它们进行比较:
As you can see, this verifies what we get by hand with the calculation done in MATLAB using convn
. If you want to compare more digits of precision, use format long
and compare them both:
>> format long;
>> disp(c11)
4.183698205668000
>> disp(c(1,1))
4.183698205668001
如您所见,除最后一位外,所有数字都是相同的.这归因于数值四舍五入.绝对可以确定:
As you can see, all of the digits are the same, except for the last one. That is attributed to numerical round-off. To be absolutely sure:
>> disp(abs(c11 - c(1,1)));
8.881784197001252e-16
...我认为顺序的差异或10 足以让我证明它们相等,对吧?
... I think a difference of an order or 10 is good enough for me to show that they're equal, right?
这篇关于了解MATLAB转换的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!