本文介绍了将2D矩阵中的多个1D信号与2D矩阵中的多个1D内核进行卷积的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大小为600 x 10的随机定义的H矩阵.矩阵H中的每个元素都可以表示为H(k,t).我获得了语音频谱图S,它是600 x 597.我使用Mel功能获得了它,因此应该为40 x 611,但是随后我使用了帧堆叠概念,其中将15帧堆叠在一起.因此,它给了我(40x15) x (611-15+1)600 x 597.

I have a randomly defined H matrix of size 600 x 10. Each element in this matrix H can be represented as H(k,t). I obtained a speech spectrogram S which is 600 x 597. I obtained it using Mel features, so it should be 40 x 611 but then I used a frame stacking concept in which I stacked 15 frames together. Therefore it gave me (40x15) x (611-15+1) which is 600 x 597.

现在我想获得一个输出矩阵Y,该矩阵由基于卷积Y(k,t) = ∑ H(k,τ)S(k,t-τ)的方程式给出.总和从τ=0τ=Lh-1. Lh在这种情况下为597.

Now I want to obtain an output matrix Y which is given by the equation based on convolution Y(k,t) = ∑ H(k,τ)S(k,t-τ). The sum goes from τ=0 to τ=Lh-1. Lh in this case would be 597.

我不知道如何获取Y.另外,我的疑问是在计算卷积时同时索引到HS.具体来说,对于Y(1,1),我们有:

I don't know how to obtain Y. Also, my doubt is the indexing into both H and S when computing the convolution. Specifically, for Y(1,1), we have:

Y(1,1) = H(1,0)S(1,1) + H(1,1)S(1,0) + H(1,2)S(1,-1) + H(1,3)S(1,-2) + ...

现在,MATLAB中没有负数之类的东西-例如S(1,-1) S(1,-2)等.那么,我应该使用哪种类型的卷积来获得Y?我尝试使用conv2fftfilt,但我认为这不会给我Y,因为Y也必须是S的大小.

Now, there is no such thing as negative indices in MATLAB - for example, S(1,-1) S(1,-2) and so on. So, what type of convolution should I use to obtain Y? I tried using conv2 or fftfilt but I think that will not give me Y because Y must also be the size of S.

推荐答案

那很容易.那是对二维信号的卷积,仅应用于一维.如果我们假设变量k用于访问行,而t用于访问列,则可以将HS的每一行视为单独的信号,其中S的每一行是一维信号,H的每一行都是卷积核.

That's very easy. That's a convolution on a 2D signal only being applied to 1 dimension. If we assume that the variable k is used to access the rows and t is used to access the columns, you can consider each row of H and S as separate signals where each row of S is a 1D signal and each row of H is a convolution kernel.

有两种方法可以解决此问题.

There are two ways you can approach this problem.

如果要坚持时域,最简单的方法是遍历输出的每一行,找到SH的每一对行的卷积,然后将输出存储在相应的输出中排.据我所知,没有工具可以仅在给定N-D信号的情况下在一个维度上进行卷积....除非您进入频域领域,但让我们将其留待以后使用.

If you want to stick with time domain, the easiest thing would be to loop over each row of the output, find the convolution of each pair of rows of S and H and store the output in the corresponding output row. From what I can tell, there is no utility that can convolve in one dimension only given an N-D signal.... unless you go into frequency domain stuff, but let's leave that for later.

类似的东西:

Y = zeros(size(S));
for idx = 1 : size(Y,1)
    Y(idx,:) = conv(S(idx,:), H(idx,:), 'same');
end

对于输出的每一行,我们使用一行S和一行H执行逐行卷积.我使用'same'标志是因为输出的大小应与S ...的行相同,这是较大的行.

For each row of the output, we perform a row-wise convolution with a row of S and a row of H. I use the 'same' flag because the output should be the same size as a row of S... which is the bigger row.

您还可以在频域中执行相同的计算.如果您对卷积和傅立叶变换的性质一无所知,那么您就会知道时域中的卷积就是频域中的乘法.您对两个信号都进行傅立叶变换,将它们按元素相乘,然后取回傅立叶逆变换.

You can also perform the same computation in frequency domain. If you know anything about the properties of convolution and the Fourier Transform, you know that convolution in time domain is multiplication in the frequency domain. You take the Fourier Transform of both signals, multiply them element-wise, then take the Inverse Fourier Transform back.

但是,您需要牢记以下复杂性:

However, you need to keep the following intricacies in mind:

  1. 执行完整卷积意味着,假设AB为一维信号,输出信号的最终长度为length(A)+length(B)-1.因此,您需要确保AB补零,以便它们都匹配相同的大小.您确保信号大小相同的原因为什么是为了允许乘法运算起作用.

  1. Performing a full convolution means that the final length of the output signal is length(A)+length(B)-1, assuming A and B are 1D signals. Therefore, you need to make sure that both A and B are zero-padded so that they both match the same size. The reason why you make sure that the signals are the same size is to allow for the multiplication operation to work.

一旦在频域中乘以信号然后取反值,就会看到Y的每一行都是卷积的 full 长度.为了确保您得到的输出与输入的大小相同,您需要在开始和结束时修剪一些点.具体来说,由于每个H的内核/列长度为10,因此您必须删除输出中每个信号的前5个点和后5个点,以匹配您在for循环代码中得到的结果.

Once you multiply the signals in the frequency domain then take the inverse, you will see that each row of Y is the full length of the convolution. To ensure that you get an output that is the same size as the input, you need to trim off some points at the beginning and at the end. Specifically, since each kernel / column length of H is 10, you would have to remove the first 5 and last 5 points of each signal in the output to match what you get in the for loop code.

通常在逆傅立叶变换之后,由于FFT算法的性质,会有一些残差复系数.优良作法是使用real删除结果的复杂值部分.

Usually after the inverse Fourier Transform, there are some residual complex coefficients due to the nature of the FFT algorithm. It's good practice to use real to remove the complex valued parts of the results.

将所有这些理论放在一起,代码就是这样的:

Putting all of this theory together, this is what the code would look like:

%// Define zero-padded H and S matrices
%// Rows are the same, but columns must be padded to match point #1
H2 = zeros(size(H,1), size(H,2)+size(S,2)-1);
S2 = zeros(size(S,1), size(H,2)+size(S,2)-1);

%// Place H and S at the beginning and leave the rest of the columns zero
H2(:,1:size(H,2)) = H;
S2(:,1:size(S,2)) = S;

%// Perform Fourier Transform on each row separately of padded matrices
Hfft = fft(H2, [], 2);
Sfft = fft(S2, [], 2);

%// Perform convolution
Yfft = Hfft .* Sfft;

%// Take inverse Fourier Transform and convert to real
Y2 = real(ifft(Yfft, [], 2));

%// Trim off unnecessary values
Y2 = Y2(:,size(H,2)/2 + 1 : end - size(H,2)/2 + 1);

Y2应该是卷积结果,并且应该与前面的for循环代码中的Y相匹配.

Y2 should be the convolved result and should match Y in the previous for loop code.

如果您确实想比较它们,我们可以.我们首先需要定义HS.为了重建我所做的事情,我生成了一个带有已知种子的随机值:

If you actually want to compare them, we can. What we'll need to do first is define H and S. To reconstruct what I did, I generated random values with a known seed:

rng(123);
H = rand(600,10);
S = rand(600,597);

一旦我们为时域版本和频域版本运行了上面的代码,让我们在命令提示符下查看它们是如何匹配的.让我们显示前5行和5列:

Once we run the above code for both the time domain version and frequency domain version, let's see how they match up in the command prompt. Let's show the first 5 rows and 5 columns:

>> format long g;
>> Y(1:5,1:5)

ans =

          1.63740867892464          1.94924208172753          2.38365646354643          2.05455605619097          2.21772526557861
          2.04478411247085          2.15915645246324          2.13672842742653          2.07661341840867          2.61567534623066
         0.987777477630861           1.3969752201781          2.46239452105228          3.07699790208937          3.04588738611503
          1.36555260994797          1.48506871890027          1.69896157726456          1.82433906982894          1.62526864072424
          1.52085236885395          2.53506897420001          2.36780282057747          2.22335617436888          3.04025523335182

>> Y2(1:5,1:5)

ans =

      1.63740867892464          1.94924208172753          2.38365646354643          2.05455605619097          2.21772526557861
      2.04478411247085          2.15915645246324          2.13672842742653          2.07661341840867          2.61567534623066
     0.987777477630861           1.3969752201781          2.46239452105228          3.07699790208937          3.04588738611503
      1.36555260994797          1.48506871890027          1.69896157726456          1.82433906982894          1.62526864072424
      1.52085236885395          2.53506897420001          2.36780282057747          2.22335617436888          3.04025523335182

对我很好!作为另一种措施,让我们找出Y中的一个值与Y2中的相应值之间的最大差值是什么:

Looks good to me! As another measure, let's figure out what the largest difference is between one value in Y and a corresponding value in Y2:

>> max(abs(Y(:) - Y2(:)))

ans =

      5.32907051820075e-15

也就是说,两个输出之间看到的最大误差约为10 .我说那还不错.

That's saying that the max error seen between both outputs is in the order of 10. I'd say that's pretty good.

这篇关于将2D矩阵中的多个1D信号与2D矩阵中的多个1D内核进行卷积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 04:09