问题描述
我正在尝试从麦克风输入中获取音高.首先,我通过FFT将信号从时域分解到频域.在执行FFT之前,我已将汉明窗应用于信号.然后,我得到了FFT的复杂结果.然后,我将结果传递给谐波乘积谱,在此对结果进行降采样,然后将降采样的峰相乘,得出一个复数值.那我该怎么做才能得到基频呢?
I'm trying to get the pitch from the microphone input. First I have decomposed the signal from time domain to frequency domain through FFT. I have applied Hamming window to the signal before performing FFT. Then I get the complex results of FFT. Then I passed the results to Harmonic product spectrum, where the results get downsampled and then multiplied the downsampled peaks and gave a value as a complex number. Then what should I do to get the fundamental frequency?
public float[] HarmonicProductSpectrum(Complex[] data)
{
Complex[] hps2 = Downsample(data, 2);
Complex[] hps3 = Downsample(data, 3);
Complex[] hps4 = Downsample(data, 4);
Complex[] hps5 = Downsample(data, 5);
float[] array = new float[hps5.Length];
for (int i = 0; i < array.Length; i++)
{
checked
{
array[i] = data[i].X * hps2[i].X * hps3[i].X * hps4[i].X * hps5[i].X;
}
}
return array;
}
public Complex[] Downsample(Complex[] data, int n)
{
Complex[] array = new Complex[Convert.ToInt32(Math.Ceiling(data.Length * 1.0 / n))];
for (int i = 0; i < array.Length; i++)
{
array[i].X = data[i * n].X;
}
return array;
}
我试图使用来获得震级,
I have tried to get the magnitude using,
magnitude[i] = (float)Math.Sqrt(array[i] * array[i] + (data[i].Y * data[i].Y));
在HarmonicProductSpectrum方法中的for循环内
.然后尝试使用来获取最大垃圾箱,
inside the for loop in HarmonicProductSpectrum method. Then tried to get the maximum bin using,
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < array.Length / 2; i++)
if (magnitude[i] > max_mag)
{
max_mag = magnitude[i];
max_index = i;
}
然后我尝试使用来获取频率
and then I tried to get the frequency using,
var frequency = max_index * 44100 / 1024;
但是我得到了A4音符(440 Hz)的垃圾值,例如1248.926、1205、859和2454.785,这些值看起来不像是A4的谐波.
But I was getting garbage values like 1248.926, 1205,859, 2454.785 for the A4 note (440 Hz) and those values don't look like harmonics of A4.
我们将非常感谢您的帮助.
A help would be greatly appreciated.
推荐答案
我在Python中实现了谐波乘积谱,以确保您的数据和算法运行良好.
I implemented harmonic product spectrum in Python to make sure your data and algorithm were working nicely.
在将谐波乘积谱应用于具有5个下采样乘数级的汉明窗口的完整数据集时,我会看到以下内容:
Here’s what I see when applying harmonic product spectrum to the full dataset, Hamming-windowed, with 5 downsample–multiply stages:
这只是最低的千赫兹,但在1 KHz以上的频谱几乎死掉了.
This is just the bottom kilohertz, but the spectrum is pretty much dead above 1 KHz.
如果我将长音频片段分成8192个样本块(4096个样本,重叠率为50%),然后在汉明窗口中每个块进行运算,然后在其上运行HPS,则这就是HPS的矩阵.这有点像整个数据集上的HPS频谱电影.基本频率似乎很稳定.
If I chunk up the long audio clip into 8192-sample chunks (with 4096-sample 50% overlap) and Hamming-window each chunk and run HPS on it, this is the matrix of HPS. This is kind of a movie of the HPS spectrum over the entire dataset. The fundamental frequency seems to be quite stable.
完整的源代码在这里-有很多代码可帮助对数据进行分块并可视化在块上运行的HPS的输出,但是从def hps(…
开始的HPS核心功能很短.但是它有很多技巧.
The full source code is here—there’s a lot of code that helps chunk the data and visualize the output of HPS running on the chunks, but the core HPS function, starting at def hps(…
, is short. But it has a couple of tricks in it.
鉴于您发现峰值的奇怪频率,可能是您在0至44.1 KHz的全频谱上工作?您只想保持正"频率,即从0到22.05 KHz,并在其上应用HPS算法(降采样-乘).
Given the strange frequencies that you’re finding the peak at, it could be that you’re operating on the full spectrum, from 0 to 44.1 KHz? You want to only keep the "positive" frequencies, i.e., from 0 to 22.05 KHz, and apply the HPS algorithm (downsample–multiply) on that.
但是,假设您从仅正频率频谱入手,请正确考虑其幅值,看来您应该会获得合理的结果.尝试保存HarmonicProductSpectrum
的输出,看是否与上面的内容相同.
But assuming you start out with a positive-frequency-only spectrum, take its magnitude properly, it looks like you should get reasonable results. Try to save out the output of your HarmonicProductSpectrum
to see if it’s anything like the above.
同样,完整的源代码位于 https://gist.github.com/fasiha/957035272009eb1c9eb370936a6af2eb2eb . (在那里,我尝试了另外两个频谱估计器,这是来自Scipy的Welch方法以及我的Blackman-Tukey频谱估计器端口.我不确定您是打算实施HPS还是考虑其他音高估计器,所以我我将Welch/Blackman-Tukey的结果留在那里.)
Again, the full source code is at https://gist.github.com/fasiha/957035272009eb1c9eb370936a6af2eb. (There I try out another couple of spectral estimator, Welch’s method from Scipy and my port of the Blackman-Tukey spectral estimator. I’m not sure if you are set on implementing HPS or if you would consider other pitch estimators, so I’m leaving the Welch/Blackman-Tukey results there.)
原始我以评论的形式写了这篇文章,但由于它令人困惑,因此不得不继续对其进行修改,因此这里是一个简短的答案.
Original I wrote this as a comment but had to keep revising it because it was confusing so here’s it as a mini-answer.
基于我对此HPS简介的简短阅读,我找到四个抽取的响应后,不要认为您正确地采用了幅度.
Based on my brief reading of this intro to HPS, I don’t think you’re taking the magnitudes correctly after you find the four decimated responses.
您要:
array[i] = sqrt(data[i] * Complex.conjugate(data[i]) *
hps2[i] * Complex.conjugate(hps2[i]) *
hps3[i] * Complex.conjugate(hps3[i]) *
hps4[i] * Complex.conjugate(hps4[i]) *
hps5[i] * Complex.conjugate(hps5[i])).X;
这使用sqrt(x * Complex.conjugate(x))
技巧来找到x
的幅度,然后将所有5个幅度相乘.
This uses the sqrt(x * Complex.conjugate(x))
trick to find x
’s magnitude, and then multiplies all 5 magnitudes.
(实际上,它会将sqrt
移动到产品外部,因此您只需执行一个sqrt
,可以节省一些时间,但是得到的结果相同.所以也许这是另一个技巧.)
(Actually, it moves the sqrt
outside the product, so you only do one sqrt
, saves some time, but gives the same result. So maybe that’s another trick.)
最后的技巧:这是结果的真实部分,因为有时由于浮点精度问题,微小的虚构分量(如1e-15)得以幸存.
Final trick: it takes that result’s real part because sometimes due to float accuracy issues, a tiny imaginary component, like 1e-15, survives.
执行完此操作后,array
应该只包含真实的float
,并且可以应用max-bin-finding.
After you do this, array
should contain just real float
s, and you can apply the max-bin-finding.
如果没有Conjugate
方法,则应该使用老式方法:
If there’s no Conjugate
method, then the old-fashioned way should work:
public float mag2(Complex c) { return c.X * c.X + c.Y * c.Y; }
// in HarmonicProductSpectrum
array[i] = sqrt(mag2(data[i]) * mag2(hps2[i]) * mag2(hps3[i]) * mag2(hps4[i]) * mag2(hps5[i]));
您在下面的评论中建议的两种方法存在代数缺陷,但上述方法应该是正确的.我不确定将复杂对象分配给浮点数时会使用什么C#(也许它使用了实数部分)?我以为那是编译器错误,但是使用上面的代码,您对复杂的数据进行了正确的处理,并且只为array[i]
分配了float
.
There’s algebraic flaws with the two approaches you suggested in the comments below, but the above should be correct. I’m not sure what C# does when you assign a Complex to a float—maybe it uses the real component? I’d have thought that’d be a compiler error, but with the above code, you’re doing the right thing with the complex data, and only assigning a float
to array[i]
.
这篇关于如何使用谐波积频谱获得基频?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!