问题描述
我需要循环读取64KB的块并进行处理,但是在文件末尾减去16个字节停止:最后16个字节是 tag
元数据
I need to read chunks of 64KB in loop, and process them, but stop at the end of file minus 16 bytes: the last 16 bytes are a tag
metadata.
文件可能非常大,所以我无法在RAM中全部读取它.
The file might be super large, so I can't read it all in RAM.
我发现的所有解决方案都有些笨拙和/或难以理解.
All the solutions I find are a bit clumsy and/or unpythonic.
with open('myfile', 'rb') as f:
while True:
block = f.read(65536)
if not block:
break
process_block(block)
如果 16< = len(block)<65536
,这很容易:这是有史以来的最后一个代码块.因此, useful_data =块[:-16]
和 tag =块[-16:]
If 16 <= len(block) < 65536
, it's easy: it's the last block ever. So useful_data = block[:-16]
and tag = block[-16:]
如果 len(block)== 65536
,则可以表示三件事:完整的块是有用的数据.还是说这64KB的块实际上是最后一个块,所以 useful_data =块[:-16]
和 tag =块[-16:]
.或者,这个64KB的块后面紧跟着另一个只有几个字节的块(比方说3个字节),因此在这种情况下: useful_data = block [:-13]
和 tag = block [-13:] + last_block [:3]
.
If len(block) == 65536
, it could mean three things: that the full block is useful data. Or that this 64KB block is in fact the last block, so useful_data = block[:-16]
and tag = block[-16:]
. Or that this 64KB block is followed by another block of only a few bytes (let's say 3 bytes), so in this case: useful_data = block[:-13]
and tag = block[-13:] + last_block[:3]
.
与区分所有这些情况相比,如何更好地处理此问题?
注意:
-
该解决方案应该适用于使用
open(...)
打开的文件,也适用于io.BytesIO()
对象,或者适用于远程SFTP打开的文件(带有pysftp
).
the solution should work for a file opened with
open(...)
, but also for aio.BytesIO()
object, or for a distant SFTP opened file (withpysftp
).
我当时正在考虑使用
f.seek(0,2)
length = f.tell()
f.seek(0)
然后每次
block = f.read(65536)
我们可以知道我们是否距离 length-f.tell()
还很远,但是完整的解决方案看起来也不是很优雅.
we can know if we are far from the end with length - f.tell()
, but again the full solution does not look very elegant.
推荐答案
您可以在每次迭代中阅读 min(65536,L-f.tell()-16)
you can just read in every iteration min(65536, L-f.tell()-16)
类似这样的东西:
from pathlib import Path
L = Path('myfile').stat().st_size
with open('myfile', 'rb') as f:
while True:
to_read_length = min(65536, L-f.tell()-16)
block = f.read(to_read_length)
process_block(block)
if f.tell() == L-16
break
没有运行它,但是希望您能理解它的主旨.
Did not ran this, but hope you get the gist of it.
这篇关于从文件对象读取块,直到从末尾x个字节为止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!