问题描述
我正在搞乱大型硬盘上python中的文件查找.我一直在寻找os.walk和glob.我通常使用os.walk,因为发现它更整洁,而且似乎速度更快(对于通常大小的目录).
I'm messing around with file lookups in python on a large hard disk. I've been looking at os.walk and glob. I usually use os.walk as I find it much neater and seems to be quicker (for usual size directories).
有没有人对他们俩有任何经验,可以说哪个更有效?正如我所说,glob似乎比较慢,但是您可以使用通配符等,就像walk一样,您必须过滤结果.这是查找核心转储的示例.
Has anyone got any experience with them both and could say which is more efficient? As I say, glob seems to be slower, but you can use wildcards etc, were as with walk, you have to filter results. Here is an example of looking up core dumps.
core = re.compile(r"core\.\d*")
for root, dirs, files in os.walk("/path/to/dir/")
for file in files:
if core.search(file):
path = os.path.join(root,file)
print "Deleting: " + path
os.remove(path)
或
for file in iglob("/path/to/dir/core.*")
print "Deleting: " + file
os.remove(file)
推荐答案
我对1000 dirs的小型网页缓存进行了研究.任务是计算dirs中的文件总数.输出为:
I made a research on a small cache of web pages in 1000 dirs. The task was to count a total number of files in dirs. The output is:
os.listdir: 0.7268s, 1326786 files found
os.walk: 3.6592s, 1326787 files found
glob.glob: 2.0133s, 1326786 files found
如您所见,os.listdir
是最快的三个.而且glog.glob
仍然比os.walk
快.
As you see, os.listdir
is quickest of three. And glog.glob
is still quicker than os.walk
for this task.
来源:
import os, time, glob
n, t = 0, time.time()
for i in range(1000):
n += len(os.listdir("./%d" % i))
t = time.time() - t
print "os.listdir: %.4fs, %d files found" % (t, n)
n, t = 0, time.time()
for root, dirs, files in os.walk("./"):
for file in files:
n += 1
t = time.time() - t
print "os.walk: %.4fs, %d files found" % (t, n)
n, t = 0, time.time()
for i in range(1000):
n += len(glob.glob("./%d/*" % i))
t = time.time() - t
print "glob.glob: %.4fs, %d files found" % (t, n)
这篇关于更快到达os.walk或glob?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!