python - 根据关键字python拆分文本字符串

I have a string of text like this:

'tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 123456
tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 789012
setup
tx cycle up.... down
rx cycle up.... down
tx cycle up.... down
rx cycle up.... down'

I need to split this string up into a list of strings that are split up into these chunks:

['tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 123456',
 'tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 789012',
 'tx cycle up... down rx cycle up.... down',
 'tx cycle up... down rx cycle up.... down']

有时他们有一个“阶段”和“扫描”的数字，但有时他们没有，我需要这个足够普遍，适用于任何这些情况，并将不得不这样做的大量数据。
。我该怎么做？
编辑：假设除了上面的文本字符串之外，我还有其他类似这样的文本字符串：

'closeloop start
closeloop ..up:677 down:098
closeloop start
closeloop ..up:568 down:123'

。但是当它到达这个文本字符串时，它将找不到任何要拆分的内容——因此，如果“closeloop start”行出现了，我如何在其中包含一个要拆分的命令，如果这些行出现了，则在“tx”行中包含一个要拆分的命令？我试过这段代码，但出现了一个类型错误：

data = re.split(r'\n((?=tx)|(?=closeloop\sstart))', data)

最佳答案

You can split on newlines that are followed by tx:

import re

re.split(r'\n(?=tx)', inputtext)

演示：

>>> import re
>>> inputtext = '''tx cycle up.... down
... rx cycle up.... down
... phase:...
... rx on scan: 123456
... tx cycle up.... down
... rx cycle up.... down
... phase:...
... rx on scan: 789012
... setup
... tx cycle up.... down
... rx cycle up.... down
... tx cycle up.... down
... rx cycle up.... down'''
>>> re.split(r'\n(?=tx)', inputtext)
['tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 123456', 'tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 789012\nsetup', 'tx cycle up.... down\nrx cycle up.... down', 'tx cycle up.... down\nrx cycle up.... down']
>>> from pprint import pprint
>>> pprint(_)
['tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 123456',
 'tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 789012\nsetup',
 'tx cycle up.... down\nrx cycle up.... down',
 'tx cycle up.... down\nrx cycle up.... down']

但是，如果只是在输入文件对象上循环（逐行读取），则可以在收集行时处理每个块：

section = []
for line in open_file_object:
    if line.startswith('tx'):
        # new section
        if section:
            process_section(section)
        section = [line]
    else:
        section.append(line)
if section:
    process_section(section)

如果需要匹配多个起始行，请在“展望”中将每个起始行作为一个分隔的备选方案：

data = re.split(r'\n(?=tx|closeloop\sstart)', data)