中选择一个随机文件

中选择一个随机文件

本文介绍了在Python中从目录(包含大量文件)中选择一个随机文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目录中包含大量文件(〜1mil).我需要从该目录中选择一个随机文件.由于文件太多,因此os.listdir自然要花很长时间才能完成.

I have a directory with a large number of files (~1mil). I need to choose a random file from this directory. Since there are so many files, os.listdir naturally takes an eternity to finish.

有没有办法可以解决这个问题?也许以某种方式知道目录中的文件数量(未列出),然后选择第n个随机生成n的文件?

Is there a way I can circumvent this problem? Maybe somehow get to know the number of files in the directory (without listing it) and choose the 'n'th file where n is randomly generated?

目录中的文件是随机命名的.

The files in the directory are randomly named.

推荐答案

A,我认为您的问题没有解决方案.第一,我不知道可移植的API将返回您目录中的条目数(不先枚举它们).第二,我认为没有API可以按编号而不是按名称返回目录条目.

Alas, I don't think there is a solution to your problem. One, I don't know of portable API that will return you the number of entries in directory (w/o enumerating them first). Two, I don't think there is API to return you directory entry by number and not by name.

因此,总的来说,程序必须枚举O(n)目录条目才能获得单个随机条目.确定条目数然后选择一个条目的简单方法将需要足够的RAM来保存完整列表(os.listdir()),或者必须枚举目录第二次才能找到random(n)项-总体n+n/2​​操作平均.

So overall, a program will have to enumerate O(n) directory entries to get a single random one. The trivial approach of determining number of entries and then picking one will either require enough RAM to hold the full listing (os.listdir()) or will have to enumerate 2nd time the directory to find the random(n) item - overall n+n/2 operations on average.

有一个更好的方法-但只有一点-请参见从文件中随机选择行.简而言之,有一种方法可以从列表/迭代器中选择长度未知的随机项目,同时一次读取一个项目,并确保可以以相同的概率选择任何项目.但这对os.listdir()无济于事,因为它已经在已经包含所有1M +条目的内存中返回了list-因此您也可以询问有关len() ...

There is slightly better approach - but only slightly - see randomly-selecting-lines-from-files. In short there is a way to pick random item from list/iterator with unknown length, while reading one item at a time and ensure that any item may be picked with equal probability. But this won't help with os.listdir() because it already returns list in memory that already contains all 1M+ entries - so you can as well ask it about len() ...

这篇关于在Python中从目录(包含大量文件)中选择一个随机文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 14:20