问题描述
我正在尝试在 python 中加载音频文件并使用谷歌语音识别进行处理
I am trying to load an audio file in python and process it with google speech recognition
问题在于,与 C++ 不同,python 不显示数据类型、类,也不让您通过创建新对象和重新打包数据来访问内存以在一种数据类型和另一种数据类型之间进行转换
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
我不明白如何在 python 中从一种数据类型转换为另一种数据类型
I dont understand how it's possible to convert from one data type to another in python
有问题的代码如下,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
错误是:
audio_data 必须是音频数据
如何转换音频对象以用于谷歌语音识别
How do I convert the audio object to be used in google speech recognition
推荐答案
@Mich,我希望您现在已经找到了解决方案.如果没有,请尝试以下操作.
@Mich, I hope you have found a solution by now. If not, please try the below.
首先,使用其他方法将 .mp3 格式转换为 .wav 格式作为预处理步骤.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
您可以从 https://cloud.google.com/speech-to-text/docs/languages
此外,您可以使用以下命令设置音频响度的最小阈值.
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
这篇关于python中谷歌语音识别的“音频数据必须是音频数据"错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!