我有一个音频样本,采样率为8khz,约为14秒。
我正在使用librosa从此音频文件中提取一些功能。
y, sr = librosa.load(file_name)
stft = np.abs(librosa.stft(y, n_fft=n_fft))
# file_length = 14.650022675736961 #sec
# defaults
# n_fft =2048
# hop_length = 512 # win_length/4 = n_fft/4 = 512 (win_length = n_fft default)
#windowsTime = n_fft * Ts # (1/sr)
stft.shape
# (1025, 631)
规格展示:
librosa.display.specshow(stft, x_axis='time', y_axis='log')
[![stft sr = 22050] [1]] [1]
现在,
我可以理解STFT的形状
631 time bins = are 4 * ( file_length / Ts * windowsTime) #overlapping
1025 frequency bins = Frames frequency gap sr/n_fft.
so there are 1025 frequencies in 0 to sr/2(Nyquest)
我无法理解的是两种不同采样率的不同图
具有相同的比例。
1-22050作为librosa默认值
2-8khz作为采样率文件
y2, sr = librosa.load(file_name, sr=None)
n_fft2 =743 # (same ratio to get same visuals for comparsion)
hop_length = 186 # (1/4 n_fft by default)
stft2 = np.abs(librosa.stft(y2, n_fft=n_fft2))
因此,stft的震撼力将有所不同
stft2.shape
# (372, 634)
[![stft sr = 743] [2]] [2]
1.但是为什么绝对频率不一样?其相同的信号只是不被过采样,因此每个采样都是唯一的。
我想念什么?是静态的y轴吗?
2.我无法理解时间仓的值。我希望从前一点到文件末尾的第一个为跳数长度,第二个为windowTime时,帧数为bin。但是单位很奇怪?
我希望能够在特定的时间(帧)中提取特定Fbin的大小,或者另外能够对其中一些进行求和以获得时间范围的磁化强度。
因此,如果我将stft [fBin的数量]取为1025 fBins的1行(stft [1025])并查看其内容,则stft [0]包含630个点,对于每个频率而言,它们正好是630个时间点,因此每个帧1-1025将具有相同的时间点。
因此,如果我也选择一个适合所有其他fbin的样本(相同的时间点),即stft [0]
我将能够选择时间范围和fBin并获得具体的幅度:
times = librosa.core.frames_to_time(stft2[0], sr=sr2, n_fft=n_fft2, hop_length=hop_length)
fft_bin = 6
time_idx = 10
print('freq (Hz)', freqs[fft_bin])
print('time (s)', times[time_idx])
print('amplitude', stft[fft_bin, time_idx])
数组([0.047375,0.047625,0.04825,0.04825,0.046875,0.04675,
0.05,0.051625,0.051,0.048,0.05225,0.050375,
0.04925,0.04725,0.051625,0.0465,0.05225,0.05,
0.053,0.053875,0.048,0.0485,0.047875,0.04775,
0.0485、0.049、0.051375、0.047125、0.051125、0.047125,
0.04725、0.05025、0.05425、0.05475、0.051375、0.060375,
0.050625、0.04875、0.054125、0.048、0.05025、0.052375,
0.04975、0.054125、0.055625、0.047125、0.0475、0.047,
0.049875、0.05025、0.048375、0.047、0.050625、0.05,
0.046625、0.04925、0.048、0.049125、0.05375、0.0545,
0.04925,0.049125,0.049125,0.049625,0.047,0.047625,
0.0535,0.051875,0.05075,0.04975,0.047375,0.049,
0.0485、0.050125、0.048、0.05475、0.05175、0.050125,
0.04725、0.0575、0.056875、0.047、0.0485、0.055375,
0.04975、0.047、0.0495、0.051375、0.04675、0.04925,
0.052125、0.04825、0.048125、0.046875、0.047、0.048625,
0.050875、0.05125、0.04825、0.052125、0.052375、0.05125,
0.049875、0.048625、0.04825、0.0475、0.048375、0.050875,
0.052875,0.0475,0.0485,0.05225,0.053625,0.05075,
0.0525,0.047125,0.0485,0.048875,0.049,0.0515,
0.055875、0.0515、0.05025、0.05125、0.054625、0.05525,
0.047、0.0545、0.052375、0.049875、0.051、0.048625,
0.0475,0.048,0.048875,0.050625,0.05375,0.051875,
0.048125、0.052125、0.048125、0.051、0.052625、0.048375,
0.047625、0.05、0.048125、0.050375、0.049125、0.053125,
0.053875、0.05075、0.052375、0.048875、0.05325、0.05825,
0.055625、0.0465、0.05475、0.051125、0.048375、0.0505,
0.04675,0.0495,0.04725,0.046625,0.049625,0.054,
0.056125、0.05175、0.050625、0.050375、0.047875、0.047,
0.048125、0.048875、0.050625、0.049875、0.047、0.0505,
0.047,0.053125,0.047625,0.05025,0.04825,0.05275,
0.051625,0.05,0.051625,0.05425,0.052,0.04775,
0.047,0.049125,0.05375,0.0535,0.04925,0.05125,
0.046375、0.04775、0.04775、0.0465、0.047、0.04675,
0.04675,0.04925,0.05125,0.046375,0.04825,0.0525,
0.057875、0.056375、0.054375、0.04825,0.0535,0.05475,
0.0485、0.048875、0.048625、0.0485、0.047625、0.046875,
0.0465、0.05125、0.054、0.05、0.048、0.047875,
0.0515,0.048125、0.055875、0.054875、0.051625、0.048125,
0.047625、0.048375、0.052875、0.0485、0.0475、0.0495,
0.05025、0.05675、0.0585、0.051625、0.05625、0.0605,
0.052125、0.0495、0.049、0.047875、0.051375、0.054125,
0.0525,0.0515,0.057875,0.055,0.05375,0.046375,
0.04775、0.0485、0.050125、0.050875、0.04925、0.049125,
0.0465、0.04975、0.053375、0.05225、0.0475、0.046375,
0.05375,0.049875,0.049875,0.047375,0.049125,0.049375,
0.04875、0.048125、0.05075、0.0505、0.046375、0.047375,
0.048625、0.0485、0.047125、0.052625、0.051125、0.04725,
0.050875、0.053875、0.0475、0.0495、0.051、0.055,
0.053,0.050125,0.04675,0.05375,0.054375,0.04725,
0.046875、0.04925、0.04725、0.0495、0.05075、0.050875,
0.04775、0.05125、0.050125、0.047875、0.04825、0.046625,
0.0475,0.046375,0.04775,0.05075,0.048125,0.046375,
0.049625、0.0495、0.04675、0.046625、0.0475、0.04825,
0.053,0.050875,0.049,0.057875,0.058875,0.049875,
0.049125、0.0475、0.05225、0.055、0.055375、0.053875,
0.051125,0.049875,0.05025,0.050875,0.049,0.0575,
0.051875,0.049375,0.04775,0.051125,0.050375,0.0465,
0.047375、0.0465、0.046375、0.048875、0.051875、0.047,
0.047125、0.047125、0.046875、0.049625、0.048625、0.051,
0.049,0.046375,0.049,0.056125,0.054625,0.047625,
0.046625、0.0475、0.051875、0.05175、0.047625、0.050375,
0.055125、0.05275、0.047125、0.05325、0.060125、0.056625,
0.053,0.052125,0.047125,0.04825,0.050375,0.05025,
0.048,0.046625,0.047125,0.04875,0.047,0.05525,
0.0535,0.047,0.0495,0.0535,0.05125,0.046625,
0.0495,0.04675,0.04875,0.047125,0.04975,0.047,
0.049875、0.046875、0.047125、0.048、0.046375、0.0495,
0.04975、0.05125、0.048375、0.049125、0.0515、0.048375,
0.052375、0.051125、0.046375、0.047125、0.050375、0.0465,
0.052375、0.05375、0.04925、0.05025、0.0565、0.054875,
0.048,0.049375,0.052625,0.055375,0.053375,0.05075,
0.048875、0.05475、0.05075、0.0485、0.049125、0.0475,
0.047375、0.047375、0.047、0.052125、0.053875、0.049,
0.052625、0.0485、0.04675、0.04875、0.05、0.0545,
0.05025、0.0495、0.0515、0.0485、0.05025、0.0465,
0.0465、0.048375、0.06375、0.10175、0.11975、0.118375,
0.121375、0.12675、0.123、0.095375、0.055、0.05525,
0.04775、0.053125、0.052375、0.056625、0.0565、0.046875,
0.048、0.05175、0.048、0.052、0.048、0.048,
0.05175,0.05025,0.049625,0.049625,0.047375,0.046625,
0.052375、0.0555、0.051375、0.050625、0.052375、0.050125,
0.048、0.052125、0.052125、0.0495、0.048875、0.048,
0.049875、0.051125、0.050625、0.048、0.0465、0.048,
0.04675、0.050875、0.048、0.046625、0.0495、0.050375,
0.046625、0.0515、0.049875、0.049625、0.04675、0.049125,
0.05025、0.050375、0.04725、0.047625、0.047、0.051625,
0.0485、0.05225、0.046875、0.0475、0.04825、0.050375,
0.05725,0.052375,0.048,0.046375,0.0475,0.0495,
0.047875、0.046375、0.049875、0.046875、0.048、0.046875,
0.048625、0.047125、0.046625、0.05、0.048875、0.04675,
0.050125、0.05425、0.051375、0.050125、0.053375、0.052,
0.053875、0.048、0.05575、0.049875、0.052125、0.048875,
0.047375、0.048875、0.049125、0.047375、0.047375、0.047625,
0.0495,0.04825,0.047875,0.04875,0.054,0.052125,
0.051、0.046625、0.04925、0.05075、0.054375、0.0555,
0.051625、0.046625、0.052125、0.055875、0.047、0.053875,
0.050875、0.0505、0.0465、0.053125、0.050875、0.050625,
0.051125、0.050875、0.056875、0.04925、0.050625、0.054125,
0.056625、0.05025、0.0465、0.04675、0.049625、0.047,
0.048375、0.047125、0.04875、0.048375、0.048875、0.04775,
0.04775,0.047,0.052125,0.050875,0.054,0.058375,
0.054,0.049125,0.04675,0.051875,0.05425,0.050125,
0.04675,0.047625,0.046375,0.05275,0.053,0.04875,
0.049125、0.047125、0.049375、0.0475、0.051125、0.0495,
0.052375、0.047、0.047125、0.050875])
[1]: https://i.imgur.com/OeKzvrb.png
[2]: https://i.imgur.com/ALtba5F.png
最佳答案
问题1:
使用specshow
时需要指定采样率:
librosa.display.specshow(stft, x_axis='time', y_axis='log', sr=sr)
否则,将使用默认值(22,050 Hz)(请参阅docs)。
问题2:
librosa.core.frames_to_time
不以stft[0]
作为参数,这将是第一帧的频点。相反,它将帧数作为第一个参数。假设您有一个
sr=10000
Hz的音频信号。然后,使用n_fft=2000
和hop_length=1000
在其上运行STFT。然后,每跳一帧,并且跳长为0.1s,因为10000个样本对应于1s,而1000个样本(1个跳)因此对应于0.1s。stft[0]
不是帧号。相反,第一个stft
的形状为(1 + n_fft/2, t)
(请参阅here)。这意味着第一维是频点,第二维是帧号(t
)。因此,
stft
中的帧总数为stft.shape[1]
。要获取源音频的长度,可以执行以下操作:
time = librosa.core.frames_to_time(stft.shape[1], sr=sr, hop_length=hop_length, n_fft=n_fft)