本文介绍了对端口音频信号的fftw处理的进一步了解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用portaudiofftwpp分析从麦克风端口收到的信号.为此,我遵循了此处提供的说明.我对此的疑问现在是:
据说我应该从传入数据中分出一个窗口.我只记录了很短的时间,然后对它进行处理之后,我的数据已经被分块了.因此,我假设已经将矩形窗口应用于我的数据.正确吗?
现在我要获得20万个数据点,应该直接将它们放入数组中吗?

I want to analyze a signal I get from my microphone port by using portaudio and fftwpp. For that I followed the explanation provided here. My questions concerning that are now:
There it is stated that I should chunk a window out of the incoming data. My data is already chunked, after I am only recording for a short time, and afterwards process it. Thus I am assuming that a rectangular window is already applied to my data. Is that correct?
Now I am getting 200k data points, should I directly put them into an array:

    Array::array1<Complex> F(np,align);
    Array::array1<double> f(n,align);               // For out-of-place transforms
    //  array1<double> f(2*np,(double *) F()); // For in-place transforms

    fftwpp::rcfft1d Forward(n,f,F);
    fftwpp::crfft1d Backward(n,F,f);
    qDebug() << "Putting " << numSamples << " into an array!";
    for(int i = 0; i < numSamples; i++)
        f[i] = this->data.recordedSamples[i];

还是我应该将它们分开?如果将它们全部放在一个阵列中,那么我将获得哪种分辨率?我的采样率设置为44.1 kHz.

or should I split them up? If I all put them in one array, which resolution do I get then? My sample rate is set to 44.1 kHz.

推荐答案

假设您的数据不是固定的(em)(换言之,频谱内容随时间变化),例如(例如语音或音乐),那么您通常希望选择一个窗口大小,在此期间数据可以被认为是固定的.对于语音和音乐,典型的窗口大小可能约为20毫秒.对于44.1 kHz的采样率,这相当于882个采样,因此1024的FFT大小可能是一个不错的起点.

Assuming your data is not stationary (in other words the spectral content is time-varying, as would be the case for e.g. speech or music), then you would typically want to pick a window size during which the data can be considered to be somewhat stationary. For speech and music a typical window size might be of the order of 20 ms. For a sample rate of 44.1 kHz this correspond to 882 samples, so an FFT size of 1024 might be a good starting point.

重叠连续的窗口也很常见,以便为信号的时变分量获得更好的时间分辨率.通常使用50%的重叠率,因此您的第一个样本块将是0..1023,第二个样本块将是512..1535,依此类推.

It's also common to overlap successive windows, to get better time resolution for the time-varying components of your signal. A 50% overlap is commonly used, so your first block of samples would be 0..1023, the second block would be 512..1535, etc.

正如@Stefan的答案中已经建议的那样,您应该在FFT之前对每个样本块应用合适的窗口函数.常用的窗户是汉明(Hamming)和冯·汉(von Hann)(又名汉宁).显然,窗口函数的大小必须与FFT的大小相同(例如N = 1024).

As has already been suggested in @Stefan's answer, you should apply a suitable window function to each block of samples, prior to the FFT. Commonly used windows are Hamming and von Hann (aka Hanning). Obviously the window function needs to be the same size as the FFT (e.g. N = 1024).

对于大小为<在数据末尾添加N即可,只需填充零即可.

For any remaining block of samples of size < N at the end of your data you can just pad with zeroes.

上述操作的常用术语是生成声谱图.它本质上是时间v频率v幅度/相位的3D数据结构,可以以各种不同的方式显示或用于进一步的频域处理.

The commonly used term for the above operation is generating a spectrogram. It's essentially a 3D data structure of time v frequency v magnitude/phase, which can bd displayed in various different ways or used for further frequency-domain processing.

另请参阅以下与StackOverflow相关的问题和答案:

See also these closely related StackOverflow questions and answers:

  • Using Apple's Accelerate framework, FFT, Hann windowing and Overlapping
  • Giving large no. of samples to KissFFT.
  • Accelerate framework vDSP, FFT framing

这篇关于对端口音频信号的fftw处理的进一步了解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 02:15