我有以下格式的数据:

data = """

[Data-0]
Data = BATCH
BatProtocol = DIAG-ST
BatCreate = 20010724

[Data-1]
Data = SAMP
SampNum = 357
SampLane = 1

[Data-2]
Data = SAMP
SampNum = 357
SampLane = 2

[Data-9]
Data = BATCH
BatProtocol = VCA
BatCreate = 20010725

[Data-10]
Data = SAMP
SampNum = 359
SampLane = 1

[Data-11]
Data = SAMP
SampNum = 359
SampLane = 2

"""

结构如下:
[Data-x]其中x是一个数字
Data =后跟BATCHSAMPLE
还有几行
我试图编写一个函数,为每个“批”生成一个列表。列表的第一项是包含行Data = BATCH的文本块,列表中的以下项是包含行Data = SAMP的文本块。我现在有
def get_batches(data):
    textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
    batch = []
    sample = next(textblocks)
    while True:
        if 'BATCH' in sample:
            batch.append(sample)
        sample = next(textblocks)
        if 'BATCH' in sample:
            yield batch
            batch = []
        else:
            batch.append(sample)

如果这样叫:
batches = get_batches(data)
for batch in batches:
    print batch
    print '_' * 20

但是,它只返回第一个“批处理”:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
 '[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
 '[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________

我的预期产出是:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
 '[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
 '[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
['[Data-9]\nData = BATCH\nBatProtocol = VCA\nBatCreate = 20010725',
 '[Data-10]\nData = SAMP\nSampNum = 359\nSampLane = 1',
 '[Data-11]\nData = SAMP\nSampNum = 359\nSampLane = 2']
____________________

我缺少什么或如何改进我的功能?

最佳答案

正如@F.J所解释的,代码的真正问题是没有产生最后一个值。然而,还有其他的改进可以做,其中一些使解决最后一个价值问题更容易。
在我第一次查看您的代码时,最引人注目的是检查if的两个'BATCH' in sample语句,这两个语句可以合并为一个。
下面是一个这样做的版本,以及在生成器上使用for循环,而不是while True

def get_batches(data):
    textblocks = (txt for txt in data.split('\n\n') if txt.strip())
    batch = [next(textblocks)]
    for sample in textblocks:
        if 'BATCH' in sample:
            yield batch
            batch = []
        batch.append(sample)
    yield batch

我在最后无条件地屈服于batch,因为在batch为空的情况下是不可能实现的(如果data为空,则开始附近的batch初始化将升高StopIteration)。

关于python - 生成器函数仅产生第一项,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15910802/

10-16 07:31