本文介绍了语音识别是否需要MFCC功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发语音识别项目,并且正在尝试选择最有意义的功能.大多数相关论文建议使用零交叉率,F0和MFCC功能,因此我正在使用这些功能.我的问题是,持续时间为00:03的训练样本具有268个功能.考虑到我正在做一个多类分类项目,每班训练有50多个样本,包括所有MFCC功能,可能会使项目遭受维度诅咒或降低其他功能的重要性".所以我的问题是,如果不能,我是否应该包括所有MFCC功能?

I'm currently developing a speech recognition project and I'm trying to select the most meaningful features.Most of the relevant papers suggest using Zero Crossing Rates, F0, and MFCC features therefore I'm using those.My question is, a training sample with duration of 00:03 has 268 features. Considering I'm doing a multi class classification project with 50+ samples per class training including all MFCC features may suffer the project from curse of dimensionality or 'reduce the importance' of the other features.So my question is, should I include all MFCC features if not can you suggest an alternative?

推荐答案

您不应使用f0和零交叉,因为它们太不稳定了.您可以简单地增加训练数据并使用MFCC,它们具有良好的表示能力.但是请记住对它们进行均值归一化.

You should not use f0 and zero crossing, they are too unstable. You can simply increase your training data and use mfccs, they have good representation capabilitites. But remember to mean-normalize them.

这篇关于语音识别是否需要MFCC功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-05 11:53