本文介绍了将FFT转换为谱图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个音频文件,我通过该文件迭代,并在每一步获取512个样本,然后将它们通过FFT。

I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.

我有数据输出因为块514漂浮长(使用IPP的ippsFFTFwd_RToCCS_32f_I),其中实部和虚部是交错的。

I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.

我的问题是,一旦我有这些复数,我该怎么办?现在我正在为每个值

My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value

const float realValue   = buffer[(y * 2) + 0];
const float imagValue   = buffer[(y * 2) + 1];
const float value   	= sqrt( (realValue * realValue) + (imagValue * imagValue) );

这给出了一些有用的东西,但我宁愿一些方法,上面的问题是峰值最终回到大约9或更多。这意味着事情被恶性饱和,然后光谱图的其他部分几乎没有出现,尽管事实上,当我通过试奏的频谱图运行音频时,它们似乎相当强大。我完全承认我不是100%确定FFT返回的数据是什么(除了它表示我传入的512样本长块的频率值)。尤其是我的理解是缺乏什么是compex数字代表什么。

This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.

任何建议和帮助将非常感谢!

Any advice and help would be much appreciated!

编辑:只是为了澄清。我的大问题是,返回的FFT值无意义,没有一个想法的尺度是什么。

Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?

Edit2:通过执行以下操作,我得到非常好的效果:

I get really nice looking results by doing the following:

size_t count2   = 0;
size_t max2 	= kFFTSize + 2;
while( count2 < max2 )
{
    const float realValue	= buffer[(count2) + 0];
    const float imagValue	= buffer[(count2) + 1];
    const float value	= (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
    buffer[count2 >> 1]	= value;
    count2 += 2;
}

对我来说这甚至看起来比大多数其他频谱图实现我看过

To my eye this even looks better than most other spectrogram implementations I have looked at.

有什么大错特错了吗?

推荐答案

要获得所有FFT可见的常见事情是采取幅度的对数。

The usual thing to do to get all of an FFT visible is to take the logarithm of the magnitude.

因此,输出缓冲区的位置告诉您检测到的频率。复数的幅度(L2范数)告诉您检测到的频率有多强,相位(反正切)给出了在图像空间中比音频空间更重要的信息。因为FFT是离散的,所以频率从0到奈奎斯特频率。在图像中,第一项(DC)通常是最大的,因此,如果这是您的目标,则是用于归一化的良好候选。我不知道这是否也是真的音频(我怀疑)

So, the position of the output buffer tells you what frequency was detected. The magnitude (L2 norm) of the complex number tells you how strong the detected frequency was, and the phase (arctangent) gives you information that is a lot more important in image space than audio space. Because the FFT is discrete, the frequencies run from 0 to the nyquist frequency. In images, the first term (DC) is usually the largest, and so a good candidate for use in normalization if that is your aim. I don't know if that is also true for audio (I doubt it)

这篇关于将FFT转换为谱图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 07:59