我有以下文字:

Test 123:

This is a blue car

Test:

This car is not blue

This car is yellow

Hello:

This is not a test

我想组合一个正则表达式来查找以 TestHello 开头并位于冒号之前的所有项目,以及可选的树数字编号,并返回之后的所有内容,直到符合相同描述的下一行。所以对于上面的文本,findall 正则表达式将返回一个数组:
[("Test", "123", "\nThis is a blue car\n"),
 ("Test", "", "\nThis car is not blue\n\nThis car is yellow\n"),
 ("Hello", "", "\nThis is not a test")]

到目前为止,我得到了这个:
r = re.findall(r'^(Test|Hello) *([^:]*):$', test, re.MULTILINE)

它根据描述匹配每一行,但我不确定如何捕获内容,直到下一行以冒号结尾。有任何想法吗?

最佳答案

您可以使用以下使用 DOTALL 修饰符的正则表达式,

(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)

DEMO
>>> import re
>>> s = """Test 123:
...
... This is a blue car
...
... Test:
...
... This car is not blue
...
... This car is yellow
...
... Hello:
...
... This is not a test"""
>>> re.findall(r'(?s)(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)', s)
[('Test', '123', '\nThis is a blue car\n'), ('Test', '', '\nThis car is not blue\n\nThis car is yellow\n'), ('Hello', '', '\nThis is not a test')]

关于Python正则表达式搜索以冒号结尾的行和之后的所有文本,直到以冒号结尾的下一行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26648548/

10-16 07:33