问题描述
我已经做了大量的研发工作,并花了很多资源来解决我的问题,但是我已经失败了,无法获得任何适当的解决方案.
I have done a lot of R&D and gone through a lot of resources to resolve my problem but I have FAILED to get any proper solution.
我已经开发了一个应用程序,现在我想向其添加基于语音的功能.
I have developed an app, now i want to add Voice based functionality to it.
必需的功能是
1)用户开始讲话时,它应该记录音频/视频和
2)用户停止讲话时,它应该播放录制的音频/视频.
注意:此处的视频表示用户在该时间段内在应用内执行的所有操作.例如,单击按钮或某种动画等.
Note:Here video means whatever user performs within app during that period of time. For example, clicks on the buttons or some kind of animation, etc.
我不想在Android上默认使用 Google的语音识别器,因为它需要Internet,但我的应用程序离线运行.此外,我还知道 CMU-Sphinx .但这对我的要求没有帮助.
I don't want to use Google's Voice Recognizer available by default in the Android as it requires Internet but my app runs offline.Also, I came to know of CMU-Sphinx. But it is not helpful as per my requirements.
已-另外,我想补充一点,我已经使用Start&停止按钮,但我不想使用这些按钮.
EDITED :-Also,I would like to add that i have achieved this using Start & Stop button but I don't want to use these buttons.
如果有人有任何想法或建议,请告诉我.
If anyone has any idea or any suggestions, please let me know.
推荐答案
最简单,最常见的方法是计算音频中的零交叉点的数量(即,符号从正变到负).
The simplest and most common method is to count the number of zero crossings in the audio (ie when the sign changes from positive to negative).
如果该值太高,则声音不太可能是语音.如果它太低,那么就不太可能是语音.
If that value is too high then the sound is unlikely to be speech. If it is too low then, again, it is unlikely to be speech.
以简单的能量级别(音频有多响)组合起来,您将获得一个非常强大的解决方案.
Combine that with a simple energy level (how loud the audio is) and you have a solution which is pretty robust.
如果您需要一个更准确的系统,那么它将变得更加复杂.一种方法是从训练数据"中提取音频特征(例如 MFCC ),使用 GMM 对它们进行建模,然后根据GMM测试从实时音频中提取的功能.这样,您可以对给定音频帧是非语音语音的可能性进行建模.但是,这不是一个简单的过程.
If you need a more accurate system then it gets much much more complex. One way is to extract audio features (MFCCs for example) from "training data", model them up with something like a GMM and then test the features you extract from live audio against the GMM. This way you can model the likelihood that a given frame of audio is speech over non-speech. This is not a simple process however.
我强烈建议您采用零交叉线,因为它易于实现并且在99%的时间内都可以正常工作:)
I'd strongly recommend going down the lines of zero-crossings as it is simple to implement and works fine 99% of the time :)
这篇关于如何识别用户何时启动&停止在Android中说话吗? (Android中的语音识别)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!