本文以制作小学课堂音频数据集为例子
1. 搜索关键字获取音视频链接
if __name__ == "__main__":
with sync_playwright() as playwright:
searcher = BLVideoSearch(playwright, headless=True)
url = searcher.make_url(keyword=["小学公开课"])
searcher.run(url, outfile="videos_url.txt")
2. 批量下载和实时视频转音频
2.1 多线程批量下载 (you-get)
you-get 子进程:
command = [YOUGET, "-o", self.video_dir, "-O", utt, task]
subprocess.run(command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
2.2 实时视频转音频
ffmpeg 子进程:
command = [FFMPEG, "-i", video_file, '-ac', '1', '-ar', '16000', audio_file]
subprocess.run(command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)