问题描述
说我们希望处理一个迭代器,并希望通过块来处理它.
每个块的逻辑取决于先前计算的块,因此groupby()
无济于事.
Say we wish to process an iterator and want to handle it by chunks.
The logic per chunk depends on previously-calculated chunks, so groupby()
does not help.
在这种情况下,我们的朋友是itertools.takewhile():
Our friend in this case is itertools.takewhile():
while True:
chunk = itertools.takewhile(getNewChunkLogic(), myIterator)
process(chunk)
问题在于,takewhile()
需要经过满足新块逻辑的最后一个元素,从而吃掉"下一个块的第一个元素.
The problem is that takewhile()
needs to go past the last element that meets the new chunk logic, thus 'eating' the first element for the next chunk.
对此有多种解决方案,包括包装或àla C的ungetc()
等.
我的问题是:是否有优雅解决方案?
There are various solutions to that, including wrapping or à la C's ungetc()
, etc..
My question is: is there an elegant solution?
推荐答案
takewhile()
确实需要查看下一个元素以确定何时切换行为.
takewhile()
indeed needs to look at the next element to determine when to toggle behaviour.
您可以使用一个包装器来跟踪最后看到的元素,并且可以对其进行重置"以备份一个元素:
You could use a wrapper that tracks the last seen element, and that can be 'reset' to back up one element:
_sentinel = object()
class OneStepBuffered(object):
def __init__(self, it):
self._it = iter(it)
self._last = _sentinel
self._next = _sentinel
def __iter__(self):
return self
def __next__(self):
if self._next is not _sentinel:
next_val, self._next = self._next, _sentinel
return next_val
try:
self._last = next(self._it)
return self._last
except StopIteration:
self._last = self._next = _sentinel
raise
next = __next__ # Python 2 compatibility
def step_back(self):
if self._last is _sentinel:
raise ValueError("Can't back up a step")
self._next, self._last = self._last, _sentinel
在将迭代器与takewhile()
结合使用之前,将其包装在其中:
Wrap your iterator in this one before using it with takewhile()
:
myIterator = OneStepBuffered(myIterator)
while True:
chunk = itertools.takewhile(getNewChunkLogic(), myIterator)
process(chunk)
myIterator.step_back()
演示:
>>> from itertools import takewhile
>>> test_list = range(10)
>>> iterator = OneStepBuffered(test_list)
>>> list(takewhile(lambda i: i < 5, iterator))
[0, 1, 2, 3, 4]
>>> iterator.step_back()
>>> list(iterator)
[5, 6, 7, 8, 9]
这篇关于如何不错过itertools.takewhile()之后的下一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!