问题描述
我目前正在Mac OS计算机上从网页上录制音频,并通过云语音API运行该音频以生成笔录.但是,结果的准确性不高,结果中缺少大量单词.
I am currently recording audio from a web page on my Mac OS computer and running it through the cloud speech api to produce a transcript. However, the results aren't that accurate and there are chunks of missing words in the results.
有没有什么步骤可以帮助我获得更准确的结果?
Are there any steps that would help me yield more accurate results?
以下是我将音频转换为文本的步骤:
Here are the steps I am taking to convert audio to text:
- 使用Soundflower将我的声卡中的音频输出传输到麦克风中.
- 播放网站音频
- 使用quickTime播放器录制保存为.m4a文件的音频.
- 使用命令行工具ffmpeg将.m4a文件转换为.flac,并将2个音频通道(立体声)组合为1个音频通道(单声道).
- 将.flac文件上传到Google Cloud Storage.该文件的采样率为44100Hz,每个采样具有24位.
- 通过node.js客户端库使用longRunningRecognize api,指向Google云存储中的文件.
推荐答案
从语音文本API 端,我建议您验证自己是否遵循最佳做法建议,例如避免过多的背景噪音以及自此以来多人同时讲话这些方面都会影响服务识别.
From the Speech-to-Text API side, I would suggest you to verify your are following the Best Practices recommendations, such as avoid excessive background noise and multiple people talking at the same time since these aspects can affect the service recognition.
我认为您具有良好的采样率和轻松的编解码器;但是,请记住,音频预处理a>会影响音频质量.在这种情况下,最好避免重新采样,不过,您可以尝试使用不同的音频格式来验证哪种格式可获得最准确的结果.
I think you have a good sampling rate and looseless codecs; However, keep in mind that the audio pre-processing can affect the audio quality. In these cases, it is preferred to avoid re-sampling, nevertheless, you can try by using different audio formats to verify which get the most accurate results.
此外,您可以使用 languageCode 和短语提示通常用于提高识别准确性的API属性
Additionally, you can use the languageCode and phrase hints API properties that are commonly used to boost the recognition accuracy.
这篇关于提高Google Cloud Speech API的准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!