问题描述
我正在编写一个Python脚本来读取文件,当我到达文件的某个部分时,读取该部分中这些行的最终方法取决于该部分中也提供的信息.所以我在这里找到了此处我可以使用类似的
I'm writing a Python script to read a file, and when I arrive at a section of the file, the final way to read those lines in the section depends on information that's given also in that section. So I found here that I could use something like
fp = open('myfile')
last_pos = fp.tell()
line = fp.readline()
while line != '':
if line == 'SPECIAL':
fp.seek(last_pos)
other_function(fp)
break
last_pos = fp.tell()
line = fp.readline()
但是,我当前代码的结构如下:
Yet, the structure of my current code is something like the following:
fh = open(filename)
# get generator function and attach None at the end to stop iteration
items = itertools.chain(((lino,line) for lino, line in enumerate(fh, start=1)), (None,))
item = True
lino, line = next(items)
# handle special section
if line.startswith['SPECIAL']:
start = fh.tell()
for i in range(specialLines):
lino, eline = next(items)
# etc. get the special data I need here
# try to set the pointer to start to reread the special section
fh.seek(start)
# then reread the special section
但是这种方法会产生以下错误:
But this approach gives the following error:
有办法防止这种情况吗?
Is there a way to prevent this?
推荐答案
将文件用作迭代器(例如,在文件上调用next()
或在for
循环中使用它)使用内部缓冲区;实际文件读取位置沿文件更远,使用.tell()
不会给您下一行要产生的位置.
Using the file as an iterator (such as calling next()
on it or using it in a for
loop) uses an internal buffer; the actual file read position is further along the file and using .tell()
will not give you the position of the next line to yield.
如果需要来回搜索,解决方案是不直接在文件对象上使用next()
,而仅使用file.readline()
.您仍然可以使用迭代器,使用iter()
的两个参数版本:
If you need to seek back and forth, the solution is not to use next()
directly on the file object but use file.readline()
only. You can still use an iterator for that, use the two-argument version of iter()
:
fileobj = open(filename)
fh = iter(fileobj.readline, '')
在fileiterator()
上调用next()
将调用fileobj.readline()
,直到该函数返回空字符串.实际上,这创建了一个不使用内部缓冲区的文件迭代器.
Calling next()
on fileiterator()
will invoke fileobj.readline()
until that function returns an empty string. In effect, this creates a file iterator that doesn't use the internal buffer.
演示:
>>> fh = open('example.txt')
>>> fhiter = iter(fh.readline, '')
>>> next(fhiter)
'foo spam eggs\n'
>>> fh.tell()
14
>>> fh.seek(0)
0
>>> next(fhiter)
'foo spam eggs\n'
请注意,您的enumerate
链可以简化为:
Note that your enumerate
chain can be simplified to:
items = itertools.chain(enumerate(fh, start=1), (None,))
尽管我很茫然,为什么您认为这里需要一个(None,)
哨兵; StopIteration
仍然会被提高,尽管稍后会再调用一次next()
.
although I am in the dark why you think a (None,)
sentinel is needed here; StopIteration
will still be raised, albeit one more next()
call later.
要读取specialLines
个计数行,请使用itertools.islice()
:
To read specialLines
count lines, use itertools.islice()
:
for lino, eline in islice(items, specialLines):
# etc. get the special data I need here
您可以直接在fh
上循环,而无需使用无限循环,并且next()
也可以在此处调用:
You can just loop directly over fh
instead of using an infinite loop and next()
calls here too:
with open(filename) as fh:
enumerated = enumerate(iter(fileobj.readline, ''), start=1):
for lino, line in enumerated:
# handle special section
if line.startswith['SPECIAL']:
start = fh.tell()
for lino, eline in islice(items, specialLines):
# etc. get the special data I need here
fh.seek(start)
但是请注意,即使回头搜索,行号仍然会增加!
but do note that your line numbers will still increment even when you seek back!
不过,您可能希望重构代码,而无需重新读取文件的各个部分.
You probably want to refactor your code to not need to re-read sections of your file, however.
这篇关于使用seek和next()调用读取文件时,是否有回退的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!