我需要下载文本文件的 zip 存档,将存档中的每个文本文件分派(dispatch)给其他处理程序进行处理,最后将解压缩的文本文件写入磁盘。

我有以下代码。它在同一个文件上使用多个打开/关闭,这看起来并不优雅。我如何使它更优雅和高效?

zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
   logfile = unzipped.open(f_info)
   handler1(logfile)
   logfile.close()   ## Cannot seek(0). The file like obj does not support seek()
   logfile = unzipped.open(f_info)
   handler2(logfile)
   logfile.close()
   unzipped.extract(f_info)

最佳答案

您的答案在您的示例代码中。只需使用 StringIO 来缓冲日志文件:

zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
   logfile = unzipped.open(f_info)
   # Here's where we buffer:
   logbuffer = cStringIO.StringIO(logfile.read())
   logfile.close()

   for handler in [handler1, handler2]:
      handler(logbuffer)
      # StringIO objects support seek():
      logbuffer.seek(0)

   unzipped.extract(f_info)

关于python - 在 Python 中多次读取同一个文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/1929662/

10-12 18:29