问题描述
我有一个.tar
文件,其中包含数百张图片(.png
).我需要通过opencv处理它们.
I have a .tar
file containing several hundreds of pictures (.png
). I need to process them via opencv.
我想知道出于效率原因是否可以在不经过光盘的情况下进行处理.换句话说,我想从与tar文件相关的内存流中读取图片.
I am wondering whether - for efficiency reasons - it is possible to process them without passing by the disc. In other, words I want to read the pictures from the memory stream related to the tar file.
例如考虑
import tarfile
import cv2
tar0 = tarfile.open('mytar.tar')
im = cv2.imread( tar0.extractfile('fname.png').read() )
最后一行不起作用,因为imread
需要文件名而不是流.
The last line doesn't work as imread
expects a file name rather than a stream.
请考虑这种直接从tar
流读取的方式可以实现,例如文本(请参见例如此SO问题).
Consider that this way of reading directly from the tar
stream can be achieved e.g. for text (see e.g. this SO question).
有没有建议使用正确的png
编码打开流?
Any suggestion to open the stream with the correct png
encoding?
对ramdisk进行解压缩当然是一个选择,尽管我一直在寻找更多 cachable .
Untarring to ramdisk is of course an option, although I was looking for something more cachable.
推荐答案
感谢@abarry和此答案的建议我设法找到了答案.
Thanks to the suggestion of @abarry and this SO answer I managed to find the answer.
请考虑以下内容
def get_np_array_from_tar_object(tar_extractfl):
'''converts a buffer from a tar file in np.array'''
return np.asarray(
bytearray(tar_extractfl.read())
, dtype=np.uint8)
tar0 = tarfile.open('mytar.tar')
im0 = cv2.imdecode(
get_np_array_from_tar_object(tar0.extractfile('fname.png'))
, 0 )
这篇关于让cv2.imread从文件对象或类似内存流的数据(此处为未提取的tar)中读取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!