问题描述
我正在使用此api的Google:-
I am using Googles this api :-
用于语音识别,并且效果很好.
for speech recognition and it's working very well.
问题在于数字,即,如果我说一二三四
,结果将是 1234
如果我说 1,234>
,结果仍然是 1234
.
The issue is with numbers i.e, if I say one two three four
the result will be 1234
and if I say one thousand two hundred thirty four
the result is still 1234
.
另一个问题是,在其他语言中,例如德语中的 elf
一词是 11
.如果您说 elf
,则结果是 11
,而不是elf.
Another issue is that with other languages i.e. the word elf
in German is eleven
. If you say elf
the result is 11
, instead of elf.
我知道我们无法控制该api,但是可以向此api添加任何参数或hack来强制其仅返回单词.
I know we have no control over the api but is there any parameters or hacks we can add to this api to force it to return only words.
有时响应会获得正确的结果,但并非总是如此.
The response some times have the correct result but not always.
这些是样本回复
1)当我说一二三四"
1) When I say "one two three four"
{"result":[{"alternative":[{"transcript":"1234","confidence":0.47215959},{"transcript":"1 2 3 4","confidence":0.25},{"transcript":"one two three four","confidence":0.25},{"transcript":"1 2 34","confidence":0.33333334},{"transcript":"1 to 34","confidence":1}],"final":true}],"result_index":0}
2)当我说"1234"时
2) When I say "one thousand two hundred thirty four"
{"result":[{"alternative":[{"transcript":"1234","confidence":0.94247383},{"transcript":"1.254","confidence":1},{"transcript":"1284","confidence":1},{"transcript":"1244","confidence":1},{"transcript":"1230 4","confidence":1}],"final":true}],"result_index":0}
我做了什么.
检查结果是否为数字,然后将每个数字按空格分隔,并检查结果数组中是否有相同的序列.在此例如结果1234变为1 2 3 4并搜索结果数组中是否存在相似的序列,然后将其转换为单词.在第二种情况下,没有1 2 3 4会保留原始结果.
Check if the result is a number, Then split each number by space and check if there is same sequence in the result array. In this e.g. Result 1234 becomes 1 2 3 4 and will search if there is a similar sequence in the result array and then convert it to words.In 2nd case there is no 1 2 3 4 so will stick with the original result.
这是代码.
String numberPattern = "[0-9]";
Pattern r1 = Pattern.compile(numberPattern);
Matcher m2 = r1.matcher(output);
if (m2.find()) {
char[] digits2 = output.toCharArray();
String digit = "";
for (char c: digits2) {
digit += c + " ";
}
for (int i = 1; i < jsonArray2.length(); i++) {
String value = jsonArray2.getJSONObject(i).getString("transcript");
if (digit.trim().equals(value.trim())) {
output = digit + " ";
}
}
}
问题是,当我说十三四八"时,这种方法会将13拆分为一,因此不是一个可靠的解决方案.
So the issue is when I "say thirteen four eight" this method will split 13 as one three and hence not a reliable solution.
更新
我尝试了新的Cloud Vision API( https://cloud.google.com/speech/),并且比v2更好.一二三四"的结果本身就是我的解决方法也适用的词.但是当我说十三四八
时,结果仍然与v2中的结果相同.
I tried the new cloud vision api (https://cloud.google.com/speech/) and it's little better than the v2. The result for one two three four
is in words itself for which my workaround is working as well. But when I say thirteen four eight
it's still the same result as in v2.
小精灵在德语中仍然是11岁.
And also elf is still 11 in German.
还尝试了 speech_context
,该方法也没有起作用.
Also tried speech_context
that also didn't worked.
推荐答案
看看这个.
您可以向API提供语音上下文"提示,例如:
You can give the API "speech context" hints, like this:
"speech_context": {
"phrases":["zero", "one", "two", ... "nine", "ten", "eleven", ... "twenty", "thirty,..., "ninety"]
}
我想这也可能适用于其他语言,例如德语.
I imagine this could work for other languages too, like German.
"speech_context": {
"phrases":["eins", "zwei", "drei", ..., "elf", "zwölf" ... ]
}
这篇关于有没有一种方法可以强制Google Speech API仅返回单词作为响应?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!