我是DSP的新手,试图为音频文件的每个分段帧计算基本频率(f(0))。 F0估计的方法可以分为三类:

  • 基于信号时域的时间动态;
  • 基于频率结构频域的
  • 混合方法。

  • 大多数示例都是根据频率结构的频域估算基本频率,我正在寻找基于信号时域的时间动态的方法。

    This article提供了一些信息,但是我仍然不清楚如何在时域中进行计算?

    https://gist.github.com/endolith/255291

    我发现,这是到目前为止使用的代码:
    def freq_from_autocorr(sig, fs):
        """
        Estimate frequency using autocorrelation
        """
        # Calculate autocorrelation and throw away the negative lags
        corr = correlate(sig, sig, mode='full')
        corr = corr[len(corr)//2:]
    
        # Find the first low point
        d = diff(corr)
        start = nonzero(d > 0)[0][0]
    
        # Find the next peak after the low point (other than 0 lag).  This bit is
        # not reliable for long signals, due to the desired peak occurring between
        # samples, and other peaks appearing higher.
        # Should use a weighting function to de-emphasize the peaks at longer lags.
        peak = argmax(corr[start:]) + start
        px, py = parabolic(corr, peak)
    
        return fs / px
    

    如何进行时域估计?

    提前致谢!

    最佳答案

    这是一个正确的实现。不是很强大,但肯定可以。为了验证这一点,我们可以生成一个已知频率的信号,看看我们将得到什么结果:

    import numpy as np
    from scipy.io import wavfile
    from scipy.signal import correlate, fftconvolve
    from scipy.interpolate import interp1d
    
    fs = 44100
    frequency = 440
    length = 0.01 # in seconds
    
    t = np.linspace(0, length, int(fs * length))
    y = np.sin(frequency * 2 * np.pi * t)
    
    def parabolic(f, x):
        xv = 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
        yv = f[x] - 1/4. * (f[x-1] - f[x+1]) * (xv - x)
        return (xv, yv)
    
    def freq_from_autocorr(sig, fs):
        """
        Estimate frequency using autocorrelation
        """
        corr = correlate(sig, sig, mode='full')
        corr = corr[len(corr)//2:]
        d = np.diff(corr)
        start = np.nonzero(d > 0)[0][0]
        peak = np.argmax(corr[start:]) + start
        px, py = parabolic(corr, peak)
    
        return fs / px
    

    结果

    运行freq_from_autocorr(y, fs)会得到~442.014 Hz,大约有0.45%的错误。

    奖金-我们可以改善

    我们可以通过稍微多一点的编码使其更加精确和强大:
    def indexes(y, thres=0.3, min_dist=1, thres_abs=False):
        """Peak detection routine borrowed from
        https://bitbucket.org/lucashnegri/peakutils/src/master/peakutils/peak.py
        """
        if isinstance(y, np.ndarray) and np.issubdtype(y.dtype, np.unsignedinteger):
            raise ValueError("y must be signed")
    
        if not thres_abs:
            thres = thres * (np.max(y) - np.min(y)) + np.min(y)
    
        min_dist = int(min_dist)
    
        # compute first order difference
        dy = np.diff(y)
    
        # propagate left and right values successively to fill all plateau pixels (0-value)
        zeros, = np.where(dy == 0)
    
        # check if the signal is totally flat
        if len(zeros) == len(y) - 1:
            return np.array([])
    
        if len(zeros):
            # compute first order difference of zero indexes
            zeros_diff = np.diff(zeros)
            # check when zeros are not chained together
            zeros_diff_not_one, = np.add(np.where(zeros_diff != 1), 1)
            # make an array of the chained zero indexes
            zero_plateaus = np.split(zeros, zeros_diff_not_one)
    
            # fix if leftmost value in dy is zero
            if zero_plateaus[0][0] == 0:
                dy[zero_plateaus[0]] = dy[zero_plateaus[0][-1] + 1]
                zero_plateaus.pop(0)
    
            # fix if rightmost value of dy is zero
            if len(zero_plateaus) and zero_plateaus[-1][-1] == len(dy) - 1:
                dy[zero_plateaus[-1]] = dy[zero_plateaus[-1][0] - 1]
                zero_plateaus.pop(-1)
    
            # for each chain of zero indexes
            for plateau in zero_plateaus:
                median = np.median(plateau)
                # set leftmost values to leftmost non zero values
                dy[plateau[plateau < median]] = dy[plateau[0] - 1]
                # set rightmost and middle values to rightmost non zero values
                dy[plateau[plateau >= median]] = dy[plateau[-1] + 1]
    
        # find the peaks by using the first order difference
        peaks = np.where(
            (np.hstack([dy, 0.0]) < 0.0)
            & (np.hstack([0.0, dy]) > 0.0)
            & (np.greater(y, thres))
        )[0]
    
        # handle multiple peaks, respecting the minimum distance
        if peaks.size > 1 and min_dist > 1:
            highest = peaks[np.argsort(y[peaks])][::-1]
            rem = np.ones(y.size, dtype=bool)
            rem[peaks] = False
    
            for peak in highest:
                if not rem[peak]:
                    sl = slice(max(0, peak - min_dist), peak + min_dist + 1)
                    rem[sl] = True
                    rem[peak] = False
    
            peaks = np.arange(y.size)[~rem]
    
        return peaks
    
    def freq_from_autocorr_improved(signal, fs):
        signal -= np.mean(signal)  # Remove DC offset
        corr = fftconvolve(signal, signal[::-1], mode='full')
        corr = corr[len(corr)//2:]
    
        # Find the first peak on the left
        i_peak = indexes(corr, thres=0.8, min_dist=5)[0]
        i_interp = parabolic(corr, i_peak)[0]
    
        return fs / i_interp, corr, i_interp
    
    

    运行freq_from_autocorr_improved(y, fs)会产生~441.825 Hz,大约有0.41%的错误。这种方法在更复杂的情况下会更好,并且计算时间要长两倍。

    通过更长的采样时间(即将length设置为例如0.1s),我们将获得更准确的结果。

    10-06 09:00