python - 正则表达式在多行标记之间读取？

我有这种形式的文字：

<文本>

一些文本efdg

一些文本abcd

< /文本>

我正在写一个正则表达式提取：

一些文本efdg

一些文本abcd

由于它是多行，因此我正在使用< text > \ n +（^ +？）\ n + < text >，但是它不起作用。如何才能做到这一点？

我尝试使用r'^。*？'但似乎没有用。

代码：
输入文件为：

< doc >

       < id1 > 123 < / id1 >

   <文本>

        abc

        定义

       < /文本>
       < / doc >
       < doc >
       < id1 > 1234 < / id1 >

       <文本>

        abcdd

        defdd

      < /文本>
       < / doc >

for line in f.read().split('</doc>\n'):

    tag = re.findall(r'<id1>\s*(.+)\s*</id1>',line)
    print tag[0]
    texttag = re.findall(r'<text>\s*(.+)\s*</text>',line,re.MULTILINE)
    print texttag

最佳答案

x="""<text>
some text efdg
some text abcd
</text> """

print [i for i in re.findall(r"<text>([\s\S]*?)<\/text>",x)[0].split("\n") if i]

您可以在text和markers之间获取split以获得结果。

关于python - 正则表达式在多行标记之间读取？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/29483584/

ID1

python - 正则表达式在多行标记之间读取？