我有一个具有这种结构的文本:

Text Starts
23/01/2018
Something here. It was a crazy day.
Believe me.
02/02/2018
Another thing happens.
Some Delimiter
20/02/2017
Text here
21/02/2017
Another text.
Here.
End Section
...text continues...


和一个正则表达式,用于匹配(日期,文本)组,直到python中的Some Delimiter为止:

result = re.findall(r"(\d{2}\/\d{2}\/\d{4}\n)(.*?)(?=\n\d{2}\/\d{2}\/\d{4}|\nSome Delimiter)", text, re.DOTALL)


结果是:

>>> print(result)
[('23/01/2018\n', 'Something here. It was a crazy day. \nBelieve me.'),
('02/02/2018\n', 'Another thing happens.'),
('20/02/2017\n', 'Text here')]


它在定界符之后获取下一组。

如何在定界符之前获取所有组?

最佳答案

>>> print(text.split('Some Delimiter')[0])
Text Starts
23/01/2018
Something here. It was a crazy day.
Believe me.
02/02/2018
Another thing happens.

>>> re.findall(r"(\d{2}\/\d{2}\/\d{4}\n)(.*?)(?=\n\d{2}\/\d{2}\/\d{4}|$)", text.split('Some Delimiter')[0], re.DOTALL)
[('23/01/2018\n', 'Something here. It was a crazy day.\nBelieve me.'), ('02/02/2018\n', 'Another thing happens.')]



text.split('Some Delimiter')[0]将在分隔符之前给出字符串
然后单独提取这部分内容


使用regex模块

>>> import regex
>>> regex.findall(r"(\d{2}\/\d{2}\/\d{4}\n)(.*?)(?=\n(?1)|$)", text.split('Some Delimiter')[0], re.DOTALL)
[('23/01/2018\n', 'Something here. It was a crazy day.\nBelieve me.'), ('02/02/2018\n', 'Another thing happens.')]



(?1)将与第一组正则表达式相同

关于python - 正则表达式在Python 3中使用分隔符匹配文本,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48930736/

10-11 04:19
查看更多