本文介绍了检测基频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在印度IIT-孟买举行的这个技术节上,他们将举办一个名为 Artbots的活动,我们将设计具有艺术能力的Artbot。我有一个音乐机器人的想法,该机器人将一首歌曲作为输入,检测歌曲中的音符,然后在钢琴上播放。我需要一些方法来帮助我计算歌曲音符的音高。有任何想法/建议如何解决吗?

There's this tech-festival in IIT-Bombay, India, where they're having an event called "Artbots" where we're supposed to design artbots with artistic abilities. I had an idea about a musical robot which takes a song as input, detects the notes in the song and plays it back on a piano. I need some method which will help me compute the pitches of the notes of the song. Any idea/suggestion on how to go about it?

推荐答案

这正是我去年所做的项目:)除了我的项目涉及跟踪人类唱歌声音的音调(而且我没有机器人来播放音乐)

This is exactly what I'm doing here as my last year project :) except one thing that my project is about tracking the pitch of human singing voice (and I don't have the robot to play the tune)

我最快的方法可以想到的是利用。它包含即用型功能,可以为您提供来自默认记录设备的FFT数据。看看BASS随附的 livespec代码示例。

The quickest way I can think of is to utilize BASS library. It contains ready-to-use function that can give you FFT data from default recording device. Take a look at "livespec" code example that comes with BASS.

顺便说一句,原始FFT数据不足以确定基本频率。您需要使用 之类的算法来获取F0。

By the way, raw FFT data will not enough to determine fundamental frequency. You need algorithm such as Harmonic Product Spectrum to get the F0.

另一个要考虑的是音频源。如果要进行FFT并在其上应用谐波积频谱。您将需要确保输入只有一个音频源。如果它包含现代歌曲等多种来源,那么应该考虑很多频率。

Another consideration is the audio source. If you are going to do FFT and apply Harmonic Product Spectrum on it. You will need to make sure the input has only one audio source. If it contains multiple sources such as in modern songs there will be to many frequencies to consider.

如果输入信号是音符,则
的频谱应由一系列
峰值组成,对应于
基本频率,谐波
分量为
基本频率的整数倍。因此,当我们
将频谱压缩多次
次(降采样),并将其与原始频谱进行比较时,我们可以看到
的最强谐波峰值位于$ b $起来原始
频谱中的第一个峰与频谱中的第二个
峰重合,压缩系数为
的两倍,这与
一致,频谱中的第三个峰
压缩三倍。
因此,当将各个频谱乘以
时,结果将在基本的
频率处形成
清晰的峰值。

If the input signal is a musical note, then its spectrum should consist of a series of peaks, corresponding to fundamental frequency with harmonic components at integer multiples of the fundamental frequency. Hence when we compress the spectrum a number of times (downsampling), and compare it with the original spectrum, we can see that the strongest harmonic peaks line up. The first peak in the original spectrum coincides with the second peak in the spectrum compressed by a factor of two, which coincides with the third peak in the spectrum compressed by a factor of three. Hence, when the various spectrums are multiplied together, the result will form clear peak at the fundamental frequency.

方法

首先,我们通过应用Hanning窗口将输入信号分为
个分段,其中
窗口大小和跳数大小为
作为输入。对于每个窗口
,我们利用短时傅立叶
变换将输入信号
从时域转换为频率
域。一旦输入位于
频域中,我们将
谐波乘积频谱技术应用于每个窗口的

First, we divide the input signal into segments by applying a Hanning window, where the window size and hop size are given as an input. For each window, we utilize the Short-Time Fourier Transform to convert the input signal from the time domain to the frequency domain. Once the input is in the frequency domain, we apply the Harmonic Product Spectrum technique to each window.

HPS涉及两个步骤:
下采样和乘法。为了降低
的采样率,我们通过重新采样在每个窗口中对频谱
进行了两次压缩:
第一次,我们将
原始频谱压缩了两次,第二次
,减三。完成
后,我们将三个
频谱相乘,找到与峰值
(最大值)相对应的
频率。这个特定的
频率代表该特定窗口的基本
频率。

The HPS involves two steps: downsampling and multiplication. To downsample, we compressed the spectrum twice in each window by resampling: the first time, we compress the original spectrum by two and the second time, by three. Once this is completed, we multiply the three spectra together and find the frequency that corresponds to the peak (maximum value). This particular frequency represents the fundamental frequency of that particular window.

HPS方法的局限性

此方法的一些不错的功能
包括:它在计算上是
便宜的,对
的加性和乘性噪声有合理的抵抗力,并且
可以根据不同类型的
输入进行调整。例如,我们可以将压缩频谱的数量
更改为
的使用,并且可以将频谱
乘法替换为频谱
加法。然而,由于人类对音调的理解基本上是对数的,因此,对高音调的跟踪可能不那么准确。

Some nice features of this method include: it is computationally inexpensive, reasonably resistant to additive and multiplicative noise, and adjustable to different kind of inputs. For instance, we could change the number of compressed spectra to use, and we could replace the spectral multiplication with a spectral addition. However, since human pitch perception is basically logarithmic, this means that low pitches may be tracked less accurately than high pitches.

HPS
方法的另一个严重缺陷是其分辨率为
,仅与所用FFT
的长度一样好计算频谱。如果我们
执行短而快速的FFT,则
的局限性在于我们可以考虑的离散
频率数量。为了使

输出中获得更高的分辨率(因此在音调输出中看到较少的
颗粒感),我们
需要采用更长的FFT,其中
需要更多时间。

Another severe shortfall of the HPS method is that it its resolution is only as good as the length of the FFT used to calculate the spectrum. If we perform a short and fast FFT, we are limited in the number of discrete frequencies we can consider. In order to gain a higher resolution in our output (and therefore see less graininess in our pitch output), we need to take a longer FFT which requires more time.

来自:

这篇关于检测基频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 00:54