问题描述
我正在尝试使用hashlib模块中的函数 hashlib.md5()计算文件的md5哈希值.
I am trying to compute md5 hash of a file with the function hashlib.md5() from hashlib module.
所以我写了这段代码:
Buffer = 128
f = open("c:\\file.tct", "rb")
m = hashlib.md5()
while True:
p = f.read(Buffer)
if len(p) != 0:
m.update(p)
else:
break
print m.hexdigest()
f.close()
我注意到,如果我用64、128、256等增加Buffer变量值,则函数更新会更快.有不能超过的上限吗?我想这可能只是RAM内存问题,但我不知道.
I noted the function update is faster if I increase Buffer variable value with 64, 128, 256 and so on. There is a upper limit I cannot exceed? I suppose it might only a RAM memory problem but I don't know.
推荐答案
大(≈2**40
)块大小导致MemoryError
,即除可用RAM外没有其他限制.另一方面,bufsize
受我的计算机上的2**31-1
限制:
Big (≈2**40
) chunk sizes lead to MemoryError
i.e., there is no limit other than available RAM. On the other hand bufsize
is limited by 2**31-1
on my machine:
import hashlib
from functools import partial
def md5(filename, chunksize=2**15, bufsize=-1):
m = hashlib.md5()
with open(filename, 'rb', bufsize) as f:
for chunk in iter(partial(f.read, chunksize), b''):
m.update(chunk)
return m
大的chunksize
可能和很小的一样慢.测量它.
Big chunksize
can be as slow as a very small one. Measure it.
我发现对于≈10
MB文件,2**15
chunksize
是我测试过的文件最快的文件.
I find that for ≈10
MB files the 2**15
chunksize
is the fastest for the files I've tested.
这篇关于Hashlib Python模块的方法更新中的最大字节数限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!