问题描述
以前的问题已经提出了相同或相似的查询
Previous questions have presented this same or similar inquiry
将 HTML5 SpeechSynthesisUtterance 生成的语音记录到文件中
但似乎没有使用 window.speechSynthesis()
创建解决方法.尽管有使用 epeak
的解决方法,meSpeak
如何在 Chrome 浏览器中创建文本或将文本转换为音频? 或向外部服务器发出请求.
yet no workarounds appear to be have been created using window.speechSynthesis()
. Though there are workarounds using epeak
, meSpeak
How to create or convert text to audio at chromium browser? or making requests to external servers.
如何捕获和记录window.speechSynthesis.speak()
调用的音频输出并将结果作为Blob
、ArrayBuffer
、AudioBuffer
或其他对象类型?
How to capture and record audio output of window.speechSynthesis.speak()
call and return result as a Blob
, ArrayBuffer
, AudioBuffer
or other object type?
推荐答案
Web Speech API 规范 目前未提供有关如何实现返回或捕获和录制 window.speechSynthesis.speak()
调用的音频输出的方法或提示.
The Web Speech API Specification does not presently provide a means or hint on how to achieve returning or capturing and recording audio output of window.speechSynthesis.speak()
call.
另见
回复:MediaStream、ArrayBuffer、Blob用于录音的 speak() 的音频结果?
回复:MediaStream、ArrayBuffer、Blob用于录音的 speak() 的音频结果?.在相关部分,用例包括但不限于
Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?. In pertinent part, use cases include, but are not limited to
说话有问题的人;例如,遭受过痛苦的人中风或其他抑制疾病的交流.他们可以转换文本到音频文件并将文件发送给另一个个人或团体.此功能将有助于帮助他们与其他人交流,类似于帮助斯蒂芬霍金交流的技术;
Persons who have issues speaking; i.e.g., persons whom have suffered astroke or other communication inhibiting afflictions. They could converttext to an audio file and send the file to another individual or group.This feature would go towards helping them communicate with other persons,similar to the technologies which assist Stephen Hawking communicate;
目前,唯一能听到音频输出的是人在浏览器前;从本质上讲,没有充分利用文本到语音功能.音频结果可用作电子邮件中的附件;媒体流;聊天系统;或其他通信应用.即控制生成的音频输出;
Presently, the only person who can hear the audio output is the personin front of the browser; in essence, not utilizing the full potential ofthe text to speech functionality. The audio result can be used as anattachment within an email; media stream; chat system; or othercommunication application. That is, control over the generated audio output;
另一个应用程序是提供免费、自由、开源的音频字典和翻译服务 - 客户端到客户端和客户端到服务器,服务器到客户端.
Another application would be to provide a free, libre, open source audiodictionary and translation service - client to client and client to server,server to client.
可以使用 navigator.mediaDevices.getUserMedia()
和 MediaRecorder() 来捕获
.Chromium 浏览器返回预期结果.Firefox 的实现有问题.在 window.speechSynthesis.speak()
调用的音频输出的输出)navigator.mediaDevices.getUserMedia()
提示下选择 Monitor of Built-in Audio Analog Stereo
.
It is possible to capture the output of audio output of window.speechSynthesis.speak()
call utilizing navigator.mediaDevices.getUserMedia()
and MediaRecorder()
. The expected result is returned at Chromium browser. Implementation at Firefox has issues. Select Monitor of Built-in Audio Analog Stereo
at navigator.mediaDevices.getUserMedia()
prompt.
解决方法很麻烦.我们应该能够获得生成的音频,至少作为 Blob
,无需 navigator.mediaDevices.getUserMedia()
和 MediaRecorder()
.
The workaround is cumbersome. We should be able to get generated audio, at least as a Blob
, without navigator.mediaDevices.getUserMedia()
and MediaRecorder()
.
浏览器用户、JavaScript 和 C++ 开发人员、浏览器实现者和规范作者显然需要更多的兴趣以进一步输入;为该功能创建适当的规范,并在浏览器的源代码中实现一致的实现;见 如何实现选项从 window.speechSynthesis.speak() 调用返回 Blob、ArrayBuffer 或 AudioBuffer.
More interest is evidently necessary by users of browsers, JavaScript and C++ developers, browser implementers and specification authors for further input; to create a proper specification for the feature, and consistent implementation at browsers' source code; see How to implement option to return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak() call.
在 Chromium 中,应安装语音调度程序,并使用 --enable-speech-dispatcher
标志集启动实例,如 window.speechSynthesis.getVoices()
返回一个空数组,参见如何在chromium中使用Web Speech API?一>.
At Chromium a speech dispatcher program should be installed and the instance launched with --enable-speech-dispatcher
flag set, as window.speechSynthesis.getVoices()
returns an empty array, see How to use Web Speech API at chromium?.
概念证明
// SpeechSynthesisRecorder.js guest271314 6-17-2017
// Motivation: Get audio output from `window.speechSynthesis.speak()` call
// as `ArrayBuffer`, `AudioBuffer`, `Blob`, `MediaSource`, `MediaStream`, `ReadableStream`, or other object or data types
// See https://lists.w3.org/Archives/Public/public-speech-api/2017Jun/0000.html
// https://github.com/guest271314/SpeechSynthesisRecorder
// Configuration: Analog Stereo Duplex
// Input Devices: Monitor of Built-in Audio Analog Stereo, Built-in Audio Analog Stereo
class SpeechSynthesisRecorder {
constructor({text = "", utteranceOptions = {}, recorderOptions = {}, dataType = ""}) {
if (text === "") throw new Error("no words to synthesize");
this.dataType = dataType;
this.text = text;
this.mimeType = MediaRecorder.isTypeSupported("audio/webm; codecs=opus")
? "audio/webm; codecs=opus" : "audio/ogg; codecs=opus";
this.utterance = new SpeechSynthesisUtterance(this.text);
this.speechSynthesis = window.speechSynthesis;
this.mediaStream_ = new MediaStream();
this.mediaSource_ = new MediaSource();
this.mediaRecorder = new MediaRecorder(this.mediaStream_, {
mimeType: this.mimeType,
bitsPerSecond: 256 * 8 * 1024
});
this.audioContext = new AudioContext();
this.audioNode = new Audio();
this.chunks = Array();
if (utteranceOptions) {
if (utteranceOptions.voice) {
this.speechSynthesis.onvoiceschanged = e => {
const voice = this.speechSynthesis.getVoices().find(({
name: _name
}) => _name === utteranceOptions.voice);
this.utterance.voice = voice;
console.log(voice, this.utterance);
}
this.speechSynthesis.getVoices();
}
let {
lang, rate, pitch
} = utteranceOptions;
Object.assign(this.utterance, {
lang, rate, pitch
});
}
this.audioNode.controls = "controls";
document.body.appendChild(this.audioNode);
}
start(text = "") {
if (text) this.text = text;
if (this.text === "") throw new Error("no words to synthesize");
return navigator.mediaDevices.getUserMedia({
audio: true
})
.then(stream => new Promise(resolve => {
const track = stream.getAudioTracks()[0];
this.mediaStream_.addTrack(track);
// return the current `MediaStream`
if (this.dataType && this.dataType === "mediaStream") {
resolve({tts:this, data:this.mediaStream_});
};
this.mediaRecorder.ondataavailable = event => {
if (event.data.size > 0) {
this.chunks.push(event.data);
};
};
this.mediaRecorder.onstop = () => {
track.stop();
this.mediaStream_.getAudioTracks()[0].stop();
this.mediaStream_.removeTrack(track);
console.log(`Completed recording ${this.utterance.text}`, this.chunks);
resolve(this);
}
this.mediaRecorder.start();
this.utterance.onstart = () => {
console.log(`Starting recording SpeechSynthesisUtterance ${this.utterance.text}`);
}
this.utterance.onend = () => {
this.mediaRecorder.stop();
console.log(`Ending recording SpeechSynthesisUtterance ${this.utterance.text}`);
}
this.speechSynthesis.speak(this.utterance);
}));
}
blob() {
if (!this.chunks.length) throw new Error("no data to return");
return Promise.resolve({
tts: this,
data: this.chunks.length === 1 ? this.chunks[0] : new Blob(this.chunks, {
type: this.mimeType
})
});
}
arrayBuffer(blob) {
if (!this.chunks.length) throw new Error("no data to return");
return new Promise(resolve => {
const reader = new FileReader;
reader.onload = e => resolve(({
tts: this,
data: reader.result
}));
reader.readAsArrayBuffer(blob ? new Blob(blob, {
type: blob.type
}) : this.chunks.length === 1 ? this.chunks[0] : new Blob(this.chunks, {
type: this.mimeType
}));
});
}
audioBuffer() {
if (!this.chunks.length) throw new Error("no data to return");
return this.arrayBuffer()
.then(ab => this.audioContext.decodeAudioData(ab))
.then(buffer => ({
tts: this,
data: buffer
}))
}
mediaSource() {
if (!this.chunks.length) throw new Error("no data to return");
return this.arrayBuffer()
.then(({
data: ab
}) => new Promise((resolve, reject) => {
this.mediaSource_.onsourceended = () => resolve({
tts: this,
data: this.mediaSource_
});
this.mediaSource_.onsourceopen = () => {
if (MediaSource.isTypeSupported(this.mimeType)) {
const sourceBuffer = this.mediaSource_.addSourceBuffer(this.mimeType);
sourceBuffer.mode = "sequence"
sourceBuffer.onupdateend = () =>
this.mediaSource_.endOfStream();
sourceBuffer.appendBuffer(ab);
} else {
reject(`${this.mimeType} is not supported`)
}
}
this.audioNode.src = URL.createObjectURL(this.mediaSource_);
}));
}
readableStream({size = 1024, controllerOptions = {}, rsOptions = {}}) {
if (!this.chunks.length) throw new Error("no data to return");
const src = this.chunks.slice(0);
const chunk = size;
return Promise.resolve({
tts: this,
data: new ReadableStream(controllerOptions || {
start(controller) {
console.log(src.length);
controller.enqueue(src.splice(0, chunk))
},
pull(controller) {
if (src.length = 0) controller.close();
controller.enqueue(src.splice(0, chunk));
}
}, rsOptions)
});
}
}
用法
let ttsRecorder = new SpeechSynthesisRecorder({
text: "The revolution will not be televised",
utternanceOptions: {
voice: "english-us espeak",
lang: "en-US",
pitch: .75,
rate: 1
}
});
// ArrayBuffer
ttsRecorder.start()
// `tts` : `SpeechSynthesisRecorder` instance, `data` : audio as `dataType` or method call result
.then(tts => tts.arrayBuffer())
.then(({tts, data}) => {
// do stuff with `ArrayBuffer`, `AudioBuffer`, `Blob`,
// `MediaSource`, `MediaStream`, `ReadableStream`
// `data` : `ArrayBuffer`
tts.audioNode.src = URL.createObjectURL(new Blob([data], {type:tts.mimeType}));
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
// AudioBuffer
ttsRecorder.start()
.then(tts => tts.audioBuffer())
.then(({tts, data}) => {
// `data` : `AudioBuffer`
let source = tts.audioContext.createBufferSource();
source.buffer = data;
source.connect(tts.audioContext.destination);
source.start()
})
// Blob
ttsRecorder.start()
.then(tts => tts.blob())
.then(({tts, data}) => {
// `data` : `Blob`
tts.audioNode.src = URL.createObjectURL(blob);
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
// ReadableStream
ttsRecorder.start()
.then(tts => tts.readableStream())
.then(({tts, data}) => {
// `data` : `ReadableStream`
console.log(tts, data);
data.getReader().read().then(({value, done}) => {
tts.audioNode.src = URL.createObjectURL(value[0]);
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
})
// MediaSource
ttsRecorder.start()
.then(tts => tts.mediaSource())
.then(({tts, data}) => {
console.log(tts, data);
// `data` : `MediaSource`
tts.audioNode.srcObj = data;
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
// MediaStream
let ttsRecorder = new SpeechSynthesisRecorder({
text: "The revolution will not be televised",
utternanceOptions: {
voice: "english-us espeak",
lang: "en-US",
pitch: .75,
rate: 1
},
dataType:"mediaStream"
});
ttsRecorder.start()
.then(({tts, data}) => {
// `data` : `MediaStream`
// do stuff with active `MediaStream`
})
.catch(err => console.log(err))
这篇关于如何从 window.speechSynthesis.speak() 调用捕获生成的音频?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!