本文介绍了Pocketsphinx - 完善热词检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近重新访问了 CMU Sphinx 并尝试设置一个基本的热词检测器对于 Android,从 教程 开始并改编 示例应用程序.

I've revisited CMU Sphinx recently and attempted to set up a basic hot-word detector for Android, starting from the tutorial and adapting the sample application.

我遇到了各种各样的问题,尽管我深入研究了他们的文档,但我一直无法解决这些问题,直到我再也读不下去了...

I'm having various issues, which I've been unable to resolve, despite delving deep into their documentation, until I can read no more...

为了复制它们,我做了一个基本项目,旨在检测关键字wakeup youwakeup me.

In order to replicate them, I made a basic project that was designed to detect the keywords wakeup you and wakeup me.

我的字典:

me M IY
wakeup W EY K AH P
you Y UW

我的语言模型:

\data\
ngram 1=5
ngram 2=5
ngram 3=4

\1-grams:
-0.9031 </s> -0.3010
-0.9031 <s> -0.2430
-1.2041 me -0.2430
-0.9031 wakeup -0.2430
-1.2041 you -0.2430

\2-grams:
-0.3010 <s> wakeup 0.0000
-0.3010 me </s> -0.3010
-0.6021 wakeup me 0.0000
-0.6021 wakeup you 0.0000
-0.3010 you </s> -0.3010

\3-grams:
-0.6021 <s> wakeup me
-0.6021 <s> wakeup you
-0.3010 wakeup me </s>
-0.3010 wakeup you </s>

\end\

以上两个都是使用推荐的工具.

还有我的关键短语文件:

And my key-phrases file:

wakeup you /1e-20/
wakeup me /1e-20/

调整上面链接的示例应用程序,这是我的代码:

Adapting the example application linked above, here is my code:

public class PocketSphinxActivity extends Activity implements RecognitionListener {

    private static final String CLS_NAME = PocketSphinxActivity.class.getSimpleName();

    private static final String HOTWORD_SEARCH = "hot_words";

    private volatile SpeechRecognizer recognizer;

    @Override
    public void onCreate(Bundle state) {
        super.onCreate(state);
        setContentView(R.layout.main);

        new AsyncTask<Void, Void, Exception>() {
            @Override
            protected Exception doInBackground(Void... params) {
                Log.i(CLS_NAME, "doInBackground");

                try {

                    final File assetsDir = new Assets(PocketSphinxActivity.this).syncAssets();

                    recognizer = defaultSetup()
                            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                            .setDictionary(new File(assetsDir, "basic.dic"))
                            .setKeywordThreshold(1e-20f)
                            .setBoolean("-allphone_ci", true)
                            .setFloat("-vad_threshold", 3.0)
                            .getRecognizer();

                    recognizer.addNgramSearch(HOTWORD_SEARCH, new File(assetsDir, "basic.lm"));
                    recognizer.addKeywordSearch(HOTWORD_SEARCH, new File(assetsDir, "hotwords.txt"));
                    recognizer.addListener(PocketSphinxActivity.this);

                } catch (final IOException e) {
                    Log.e(CLS_NAME, "doInBackground IOException");
                    return e;
                }

                return null;
            }

            @Override
            protected void onPostExecute(final Exception e) {
                Log.i(CLS_NAME, "onPostExecute");

                if (e != null) {
                    e.printStackTrace();
                } else {
                    recognizer.startListening(HOTWORD_SEARCH);
                }
            }
        }.execute();
    }

    @Override
    public void onBeginningOfSpeech() {
        Log.i(CLS_NAME, "onBeginningOfSpeech");
    }

    @Override
    public void onPartialResult(final Hypothesis hypothesis) {
        Log.i(CLS_NAME, "onPartialResult");

        if (hypothesis == null)
            return;

        final String text = hypothesis.getHypstr();
        Log.i(CLS_NAME, "onPartialResult: text: " + text);

    }

    @Override
    public void onResult(final Hypothesis hypothesis) {
        // unused
        Log.i(CLS_NAME, "onResult");
    }

    @Override
    public void onEndOfSpeech() {
        // unused
        Log.i(CLS_NAME, "onEndOfSpeech");
    }


    @Override
    public void onError(final Exception e) {
        Log.e(CLS_NAME, "onError");
        e.printStackTrace();
    }

    @Override
    public void onTimeout() {
        Log.i(CLS_NAME, "onTimeout");
    }

    @Override
    public void onDestroy() {
        super.onDestroy();
        Log.i(CLS_NAME, "onDestroy");

        recognizer.cancel();
        recognizer.shutdown();
    }
}

注意:- 我是否应该将我选择的关键短语(和其他相关文件)更改为更加不同,并且我在安静的环境中测试实现,应用的设置和阈值工作非常成功.

Note:- Should I alter my selected key-phrases (and other related files) to be more dissimilar and I test the implementation in a quiet environment, the setup and thresholds applied work very successfully.

问题

  1. 当我说 wakeup youwakeup me 时,两者都会被检测到.
  1. When I say either wakeup you or wakeup me, both will be detected.

我无法确定如何对结尾音节应用增加的权重.

I can't establish how to apply an increased weighting to the end syllables.

  1. 当我说只是唤醒时,通常(但不总是)两者都会被检测到.
  1. When I say just wakeup, often (but not always) both will be detected.

我无法确定如何避免这种情况发生.

I can't establish how I can avoid this occurring.

  1. 在针对背景噪声进行测试时,误报过于频繁.

我无法降低我正在使用的基本阈值,否则在正常情况下无法始终如一地检测到关键短语.

I can't lower the base thresholds I am using, otherwise the keyphrases are not detected consistently under normal conditions.

  1. 针对背景噪声进行长时间测试(5 分钟应该足以复制)时,立即返回安静的环境并说出关键短语,结果没有检测到.

成功并重复检测关键短语需要一段不确定的时间 - 就好像测试是在安静的环境中开始的一样.

It takes an undetermined period of time before the keyphrases are detected successfully and repeatedly - as though the test had begun in a quiet environment.

我发现了一个可能相关的问题,但这些链接不再有效.我想知道我是否应该更频繁地重置识别器,以便以某种方式将背景噪声重置为检测阈值的平均值?

I found a potentially related question, but the links no longer work. I wonder if I should be resetting the recogniser more frequently, so to somehow reset the background noise from being averaged into the detection thresholds?

  1. 最后,我想知道我对有限关键词的要求是否可以让我减小声学模型的大小?

在我的应用程序中打包时的任何开销当然是有益的.

Any overhead when packaging within my application would of course be beneficial.

最后(老实说!),特别希望 @NikolayShmyrev 会发现这个问题,有没有计划完全通过 gradle 包装一个基本的 Android 实现/sdk?

Very finally (honest!), and specifically hoping that @NikolayShmyrev will spot this question, are there any plans to wrap a base Android implementation/sdk entirely via gradle?

感谢那些走到今天的人......

I thank you to those who made it this far...

推荐答案

您不需要语言模型,因为您不使用它.

You do not need language model since you do not use it.

我无法降低我正在使用的基本阈值,否则在正常情况下无法始终如一地检测到关键短语.

1e-20 是一个合理的阈值,您可以提供错误检测的样本记录,让我更好地了解发生了什么.

1e-20 is a reasonable threshold, you can provide the sample recording where you have false detections to give me better idea what is going on.

针对背景噪音进行长时间测试(5 分钟应该足以复制)时,立即返回安静的环境并说出关键短语,结果未检测到.

这是预期的行为.总体而言,长时间的背景噪音使识别器更难快速适应音频参数.如果您的任务是在嘈杂的地方发现单词,最好使用某种硬件降噪功能,例如具有降噪功能的蓝牙耳机.

This is an expected behavior. Overall, long background noise makes it harder for recognizer to quickly adapt to audio parameters. If your task is to spot words in noisy place, it's better to use some kind of hardware noise cancellation, for example, a bluetooth headset with a noise cancellation.

最后,我想知道我对有限关键词的要求是否可以让我减少声学模型的大小?

现在不可能了.如果你只是想找人,你可以试试 https://snowboy.kitt.​​ai

It is not possible now. If you look just for spotting you can try https://snowboy.kitt.ai

这篇关于Pocketsphinx - 完善热词检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 06:12