音频帧包含什么?

本文介绍了音频帧包含什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究如何比较声音文件(wave).基本上我想将存储的声音文件(wav)与麦克风的声音进行比较.所以最后我想预先存储一些我自己的语音命令，然后当我运行我的应用程序时，我想将预先存储的文件与来自麦克风的输入进行比较.

Im doing some research on how to compare sound files(wave). Basically i want to compare stored soundfiles (wav) with sound from a microphone. So in the end i would like to pre-store some voice commands of my own and then when Im running my app I would like to compare the pre-stored files with input from the microphone.

我的想法是在比较时留出一些余量，因为我想以完全相同的方式连续说两次的话会很困难.

My thought was to put in some margin when comparing because saying something two times in a row in the exatly same way would be difficult I guess.

所以经过一些谷歌搜索后，我看到 python 有这个名为 wave 的模块和 Wave_read 对象.该对象有一个名为 readframes(n) 的函数:

So after some googling i see that python have this module named wave and the Wave_read object. That object has a function named readframes(n):

最多读取并返回 n 帧音频，作为一串字节.

这些字节包含什么?我正在考虑一次一帧地循环遍历波形文件，逐帧比较它们.

What does these bytes contain? Im thinking of looping thru the wave files one frame at the time comparing them frame by frame.

推荐答案

音频帧或样本包含特定时间点的振幅(响度)信息.为了产生声音，需要依次播放数万帧以产生频率.

An audio frame, or sample, contains amplitude (loudness) information at that particular point in time. To produce sound, tens of thousands of frames are played in sequence to produce frequencies.

在 CD 质量音频或未压缩波形音频的情况下，每秒大约有 44,100 帧/样本.这些帧中的每一个都包含 16 位分辨率，可以相当精确地表示声级.另外，因为 CD 音频是立体声，所以实际上有两倍的信息，左声道 16 位，右声道 16 位.

In the case of CD quality audio or uncompressed wave audio, there are around 44,100 frames/samples per second. Each of those frames contains 16-bits of resolution, allowing for fairly precise representations of the sound levels. Also, because CD audio is stereo, there is actually twice as much information, 16-bits for the left channel, 16-bits for the right.

当你使用python中的sound模块获取一个frame时，它会以一串十六进制字符的形式返回:

When you use the sound module in python to get a frame, it will be returned as a series of hexadecimal characters:

8 位单声道信号的一个字符.
8 位立体声的两个字符.
16 位单声道的两个字符.
16 位立体声的四个字符.

为了转换和比较这些值，您必须首先使用 python 波形模块的函数来检查位深度和通道数.否则，您将比较不匹配的质量设置.

In order to convert and compare these values you'll have to first use the python wave module's functions to check the bit depth and number of channels. Otherwise, you'll be comparing mismatched quality settings.

这篇关于音频帧包含什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！