如何结束Google语音转文本流的正常识别并获取待处理的文本结果?

已更新:现在看来，它已被确认为错误.在修复之前，我正在寻找一种可能的解决方法. 已更新:以供将来参考，此处是列出了当前和以前跟踪的涉及 streamingRecognize 的问题.我希望对于使用 streamingRecognize 的用户来说这是一个普遍的问题，令人惊讶的是以前从未对此进行过报道.也将它作为错误作为错误提交给 issuetracker.google.com 解决方案我的糟糕-毫不奇怪，这在我的代码中变成了模糊的竞争条件.我整理了一个可以按预期工作的自包含示例(要点).它帮助我找到了问题所在.希望它可以帮助其他人和我未来的自我: //一个简单的streamingRecognize工作流程，//通过@noseratio在Node v15.0.1上进行了测试从'fs'导入fs;从路径"导入路径；从"URL"导入URL；从"util"导入util；从计时器/承诺"中导入计时器；从"@ google-cloud/speech"导入语音；出口 {}//需要16位16KHz原始PCM音频const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url))，"sample.raw");const encoding ='LINEAR16';const sampleRateHertz = 16000;const languageCode ='en-US';const request = {配置:{编码:编码，sampleRateHertz:sampleRateHertz，languageCode:languageCode，}，interimResults:false//如果需要中期结果，请将其设置为true};//初始化SpeechClientconst client = new speech.v1p1beta1.SpeechClient();等待client.initialize();//将音频流传输到Google Cloud Speech APIconst stream = client.streamingRecognize(request);//记录所有数据stream.on('data'，data => {const结果= data.results [0];console.log(`SR结果，最终:$ {result.isFinal}，文本:$ {result.alternatives [0] .transcript}`)；});//记录所有错误stream.on('error'，error => {console.warn(`SR错误:$ {error.message}`);});//观察数据事件const dataPromise =新的Promise(resolve => stream.once('data'，resolve)));//观察错误事件const errorPromise =新的Promise((解决，拒绝)=> stream.once('错误'，拒绝));//观察完成事件const finishPromise =新的Promise(resolve => stream.once('finish'，resolve));//观察关闭事件const closePromise =新的Promise(resolve => stream.once('close'，resolve)));//我们可以通过管道传递它://fs.createReadStream(filename).pipe(stream);//，但我们要模拟网络套接字数据//将RAW音频读取为Bufferconst data =等待fs.promises.readFile(filename，null);//模拟多个音频块console.log(正在写...")；const chunkSize = 4096;for(让i = 0; i< data.length; i + = chunkSize){stream.write(data.slice(i，i + chunkSize));等待timers.setTimeout(50);}console.log(完成编写".)；console.log(结束之前...")；等待util.promisify(c => stream.end(c))();console.log(结束后".)；//比赛竞赛等待Promise.race([errorPromise.catch(()=> console.log("error")))，dataPromise.then(()=> console.log("data")))，closePromise.then(()=> console.log("close")))，finishPromise.then(()=> console.log("finish")))]);console.log(销毁...")；stream.destroy();console.log(最终超时...")；等待timers.setTimeout(1000);console.log(退出".)；输出:写...写完了.在结束之前...SR结果，最终:正确，文本:这是测试我正在测试语音识别这就是终点结束之后.数据结束破坏...最终超时...关闭正在退出.要对其进行测试，需要一个16位/16KHz原始PCM音频文件.任意WAV文件无法按原样工作，因为它包含带有元数据的标头.I'd like to be able to end a Google speech-to-text stream (created with streamingRecognize), and get back the pending SR (speech recognition) results.In a nutshell, the relevant Node.js code:// create SR streamconst stream = speechClient.streamingRecognize(request);// observe data eventconst dataPromise = new Promise(resolve => stream.on('data', resolve));// observe error eventconst errorPromise = new Promise((resolve, reject) => stream.on('error', reject));// observe finish eventconst finishPromise = new Promise(resolve => stream.on('finish', resolve));// send the audiostream.write(audioChunk);// for testing purposes only, give the SR stream 2 seconds to absorb the audioawait new Promise(resolve => setTimeout(resolve, 2000));// end the SR stream gracefully, by observing the completion callbackconst endPromise = util.promisify(callback => stream.end(callback))();// a 5 seconds test timeoutconst timeoutPromise = new Promise(resolve => setTimeout(resolve, 5000));// finishPromise wins the race hereawait Promise.race([ dataPromise, errorPromise, finishPromise, endPromise, timeoutPromise]);// endPromise wins the race hereawait Promise.race([ dataPromise, errorPromise, endPromise, timeoutPromise]);// timeoutPromise wins the race hereawait Promise.race([dataPromise, errorPromise, timeoutPromise]);// I don't see any data or error events, dataPromise and errorPromise don't get settledWhat I experience is that the SR stream ends successfully, but I don't get any data events or error events. Neither dataPromise nor errorPromise gets resolved or rejected.How can I signal the end of my audio, close the SR stream and still get the pending SR results?I need to stick with streamingRecognize API because the audio I'm streaming is real-time, even though it may stop suddenly.To clarify, it works as long as I keep streaming the audio, I do receive the real-time SR results. However, when I send the final audio chunk and end the stream like above, I don't get the final results I'd expect otherwise.To get the final results, I actually have to keep streaming silence for several more seconds, which may increase the ST bill. I feel like there must be a better way to get them.Updated: so it appears, the only proper time to end a streamingRecognize stream is upon data event where StreamingRecognitionResult.is_final is true. As well, it appears we're expected to keep streaming audio until data event is fired, to get any result at all, final or interim.This looks like a bug to me, filing an issue.Updated: it now seems to have been confirmed as a bug. Until it's fixed, I'm looking for a potential workaround.Updated: for future references, here is the list of the current and previously tracked issues involving streamingRecognize.I'd expect this to be a common problem for those who use streamingRecognize, surprised it hasn't been reported before. Submitting it as a bug to issuetracker.google.com, as well. 解决方案 My bad — unsurprisingly, this turned to be an obscure race condition in my code.I've put together a self-contained sample that works as expected (gist). It helped me tracking down the issue. Hopefully, it may help others and my future self:// A simple streamingRecognize workflow,// tested with Node v15.0.1, by @noseratioimport fs from 'fs';import path from "path";import url from 'url';import util from "util";import timers from 'timers/promises';import speech from '@google-cloud/speech';export {}// need a 16-bit, 16KHz raw PCM audioconst filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw");const encoding = 'LINEAR16';const sampleRateHertz = 16000;const languageCode = 'en-US';const request = { config: { encoding: encoding, sampleRateHertz: sampleRateHertz, languageCode: languageCode, }, interimResults: false // If you want interim results, set this to true};// init SpeechClientconst client = new speech.v1p1beta1.SpeechClient();await client.initialize();// Stream the audio to the Google Cloud Speech APIconst stream = client.streamingRecognize(request);// log all datastream.on('data', data => { const result = data.results[0]; console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`);});// log all errorsstream.on('error', error => { console.warn(`SR error: ${error.message}`);});// observe data eventconst dataPromise = new Promise(resolve => stream.once('data', resolve));// observe error eventconst errorPromise = new Promise((resolve, reject) => stream.once('error', reject));// observe finish eventconst finishPromise = new Promise(resolve => stream.once('finish', resolve));// observe close eventconst closePromise = new Promise(resolve => stream.once('close', resolve));// we could just pipe it:// fs.createReadStream(filename).pipe(stream);// but we want to simulate the web socket data// read RAW audio as Bufferconst data = await fs.promises.readFile(filename, null);// simulate multiple audio chunksconsole.log("Writting...");const chunkSize = 4096;for (let i = 0; i < data.length; i += chunkSize) { stream.write(data.slice(i, i + chunkSize)); await timers.setTimeout(50);}console.log("Done writing.");console.log("Before ending...");await util.promisify(c => stream.end(c))();console.log("After ending.");// race for eventsawait Promise.race([ errorPromise.catch(() => console.log("error")), dataPromise.then(() => console.log("data")), closePromise.then(() => console.log("close")), finishPromise.then(() => console.log("finish"))]);console.log("Destroying...");stream.destroy();console.log("Final timeout...");await timers.setTimeout(1000);console.log("Exiting.");The output:Writting...Done writing.Before ending...SR results, final: true, text: this is a test I'm testing voice recognition This Is the EndAfter ending.datafinishDestroying...Final timeout...closeExiting.To test it, a 16-bit/16KHz raw PCM audio file is required. An arbitrary WAV file wouldn't work as is because it contains a header with metadata. 这篇关于如何结束Google语音转文本流的正常识别并获取待处理的文本结果?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！