问题描述
我正在使用 Scikit-learn 研究音乐分类方法,该过程的第一步是将音乐文件转换为 numpy 数组.
I'm working on a music classification methodology with Scikit-learn, and the first step in that process is converting a music file to a numpy array.
在尝试从 python 脚本调用 ffmpeg 失败后,我决定直接将文件通过管道输入:
After unsuccessfully trying to call ffmpeg from a python script, I decided to simply pipe the file in directly:
FFMPEG_BIN = "ffmpeg"
cwd = (os.getcwd())
dcwd = (cwd + "/temp")
if not os.path.exists(dcwd): os.makedirs(dcwd)
folder_path = sys.argv[1]
f = open("test.txt","a")
for f in glob.glob(os.path.join(folder_path, "*.mp3")):
ff = f.replace("./", "/")
print("Name: " + ff)
aa = (cwd + ff)
command = [ FFMPEG_BIN,
'-i', aa,
'-f', 's16le',
'-acodec', 'pcm_s16le',
'-ar', '22000', # ouput will have 44100 Hz
'-ac', '1', # stereo (set to '1' for mono)
'-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
raw_audio = pipe.proc.stdout.read(88200*4)
audio_array = numpy.fromstring(raw_audio, dtype="int16")
print (str(audio_array))
f.write(audio_array + "\n")
问题是,当我运行文件时,它会启动 ffmpeg,然后什么都不做:
The problem is, when I run the file, it starts ffmpeg and then does nothing:
[mp3 @ 0x1446540] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/home/don/Code/Projects/MC/Music/Spaz.mp3':
Metadata:
title : Spaz
album : Seeing souns
artist : N*E*R*D
genre : Hip-Hop
encoder : Audiograbber 1.83.01, LAME dll 3.96, 320 Kbit/s, Joint Stereo, Normal quality
track : 5/12
date : 2008
Duration: 00:03:50.58, start: 0.000000, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Output #0, s16le, to 'pipe:':
Metadata:
title : Spaz
album : Seeing souns
artist : N*E*R*D
genre : Hip-Hop
date : 2008
track : 5/12
encoder : Lavf56.4.101
Stream #0:0: Audio: pcm_s16le, 22000 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc56.1.100 pcm_s16le
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
它只是坐在那里,挂着,比歌曲长得多.我在这里做错了什么?,
It just sits there, hanging, for far longer than the song is. What am I doing wrong here?,
推荐答案
我推荐你使用 pymedia 或 audioread 或decoder.py.还有 pyffmpeg 和类似的模块可以做你想做的事.看看 pypi.python.org.
I recommend you pymedia or audioread or decoder.py. There are also pyffmpeg and similar modules for doing just that what you want. Take a look at pypi.python.org.
当然,这些不会帮你把数据变成numpy数组.
Of course, these will not help you turn the data into numpy array.
无论如何,这是使用管道到 ffmpeg 粗略地完成的方式:
Anyway, this is how it is done crudely using piping to ffmpeg:
from subprocess import Popen, PIPE
import numpy as np
def decode (fname):
# If you are on Windows use full path to ffmpeg.exe
cmd = ["./ffmpeg.exe", "-i", fname, "-f", "wav", "-"]
# If you are on W add argument creationflags=0x8000000 to prevent another console window jumping out
p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
data = p.communicate()[0]
return np.fromstring(data[data.find("data")+4:], np.int16)
这就是它应该如何用于基本用途.
This is how it should work for basic use.
它应该可以工作,因为 ffmpeg 的输出默认是 16 位音频.但是如果你乱来,你应该知道 numpy 没有 int24,所以你将被迫做一些位操作并将 24 位音频表示为 32 位音频.只是,不要使用 24 位,世界就快乐了.:D
It should work because output of ffmpeg is by default 16 bit audio.But if you mess around, you should know that numpy doesn't have int24, so you will be forced to do some bit manipulations and represent 24 bit audio as 32 bit audio. Just, don't use 24 bit, and the world is happy. :D
如果您需要更复杂的东西,我们可能会在评论中讨论改进代码.
We may discuss refining the code in comments, if you need something more sophisticated.
这篇关于试图将 mp3 文件转换为 Numpy 数组,而 ffmpeg 只是挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!