我正在尝试构建一个应用程序,该应用程序使用流音频输入(例如:麦克风中的一条线),并使用IBM Bluemix(Watson)进行语音转文本。
我简要修改了示例代码here。此示例发送一个WAV,但是我发送的是FLAC ...这[无关]不相关。
结果很差,非常差。这是使用Java Websockets代码时得到的:
{
"result_index": 0,
"results": [
{
"final": true,
"alternatives": [
{
"transcript": "it was six weeks ago today the terror ",
"confidence": 0.92
}
]
}
]
}
现在,将上面的结果与下面的结果进行比较。这些是发送相同内容但使用cURL(HTTP POST)时的结果:
{
"results": [
{
"alternatives": [
{
"confidence": 0.945,
"transcript": "it was six weeks ago today the terrorists attacked the U. S. consulate in Benghazi Libya now we've obtained email alerts that were put out by the state department as the attack unfolded as you know four Americans were killed including ambassador Christopher Stevens "
}
],
"final": true
},
{
"alternatives": [
{
"confidence": 0.942,
"transcript": "sharyl Attkisson has our story "
}
],
"final": true
}
],
"result_index": 0
}
这几乎是完美的结果。
为什么使用Websockets会有区别?
最佳答案
该问题已在3.0.0-RC1
版本中修复。
您可以从以下位置获得新的罐子:
<dependency>
<groupId>com.ibm.watson.developer_cloud</groupId>
<artifactId>java-sdk</artifactId>
<version>3.0.0-RC1</version>
</dependency>
'com.ibm.watson.developer_cloud:java-sdk:3.0.0-RC1'
下载jar-with-dependencies(〜1.4MB)
这是如何使用WebSocket识别flac音频文件的示例
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");
FileInputStream audio = new FileInputStream("path-to-audio-file.flac");
RecognizeOptions options = new RecognizeOptions.Builder()
.continuous(true)
.interimResults(true)
.contentType(HttpMediaType.AUDIO_FLAC)
.build();
service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
@Override
public void onTranscription(SpeechResults speechResults) {
System.out.println(speechResults);
}
});
}
要测试的FLAC文件:https://s3.amazonaws.com/mozart-company/tmp/4.flac
注意:
3.0.0-RC1
是发行候选版本。我们将在下周发布产品发布(3.0.1
)。