问题描述
我想将许多文件都当作一个文件来对待.用[发生器]/不将整个文件读入内存的[文件名] => [文件对象] => [行]的正确pythonic方法是什么?
I want to treat many files as if they were all one file. What's the proper pythonic way to take [filenames] => [file objects] => [lines] with generators/not reading an entire file into memory?
我们都知道打开文件的正确方法:
We all know the proper way to open a file:
with open("auth.log", "rb") as f:
print sum(f.readlines())
我们知道将多个迭代器/生成器链接为一个长链的正确方法:
And we know the correct way to link several iterators/generators into one long one:
>>> list(itertools.chain(range(3), range(3)))
[0, 1, 2, 0, 1, 2]
但是如何将多个文件链接在一起并保留上下文管理器?
but how do I link multiple files together and preserve the context managers?
with open("auth.log", "rb") as f0:
with open("auth.log.1", "rb") as f1:
for line in itertools.chain(f0, f1):
do_stuff_with(line)
# f1 is now closed
# f0 is now closed
# gross
我可以忽略上下文管理器并执行类似的操作,但是感觉不正确:
I could ignore the context managers and do something like this, but it doesn't feel right:
files = itertools.chain(*(open(f, "rb") for f in file_names))
for line in files:
do_stuff_with(line)
或者是异步IO-PEP 3156 的用途,我只是必须稍后再等待优雅的语法?
Or is this kind of what Async IO - PEP 3156 is for and I'll just have to wait for the elegant syntax later?
推荐答案
总是有 fileinput
.
for line in fileinput.input(filenames):
...
阅读源文件,看来fileinput.FileInput
可以可用作上下文管理器.要解决此问题,您可以使用 contextlib.closing
,因为FileInput
实例具有健全的已实现的close
方法:
Reading the source however, it appears that fileinput.FileInput
can't be used as a context manager. To fix that, you could use contextlib.closing
since FileInput
instances have a sanely implemented close
method:
from contextlib import closing
with closing(fileinput.input(filenames)) as line_iter:
for line in line_iter:
...
上下文管理器的另一种选择是编写一个简单的函数,遍历文件并随行生成行:
An alternative with the context manager, is to write a simple function looping over the files and yielding lines as you go:
def fileinput(files):
for f in files:
with open(f,'r') as fin:
for line in fin:
yield line
这里不需要真正的itertools.chain
恕我直言...神奇之处在于yield
语句,该语句用于将普通函数转换为异常懒惰的生成器.
No real need for itertools.chain
here IMHO ... The magic here is in the yield
statement which is used to transform an ordinary function into a fantastically lazy generator.
顺便说一句,从python3.2开始,fileinput.FileInput
被实现为上下文管理器,其功能与我们之前使用contextlib
所做的完全一样.现在我们的示例变为:
As an aside, starting with python3.2, fileinput.FileInput
is implemented as a context manager which does exactly what we did before with contextlib
. Now our example becomes:
# Python 3.2+ version
with fileinput.input(filenames) as line_iter:
for line in line_iter:
...
尽管其他示例也适用于python3.2 +.
although the other example will work on python3.2+ as well.
这篇关于在多个文件的所有行上进行迭代的最pythonic方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!