美好的一天,

我需要提取一部分字符串,看起来像这样:

"some_text MarkerA some_text_to_extract MarkerB some_text"
"some_text MarkerA some_text_to_extract"


在这两种情况下,我都需要提取some_text_to_extract
MarkerAMarkerB-预定义的文本字符串。

我尝试过此正则表达式,但没有运气:

".*\sMarkerA(.*)MarkerB.*" - does not work in case 2
".*\sMarkerA(.*)(?=MarkerB)?.*" - wrong result "some_text_to_extract MarkerB some_text"
".*\sMarkerA(.*)(?:MarkerB)?.*" - does not work at all


您能帮我解决这个问题吗?

最佳答案

尝试:

".*\sMarkerA(.*?)(?=$|MarkerB)"


测试代码:

#!/usr/bin/env python

tests = [
        ("some_text MarkerA some_text_to_extract MarkerB some_text"," some_text_to_extract "),
        ("some_text MarkerA some_text_to_extract"," some_text_to_extract")
        ]

import re
reg = re.compile(r".*\sMarkerA(.*?)(?=$|MarkerB)")

for (input,expected) in tests:
    mo = reg.match(input)
    assert mo is not None
    print mo.group(1),expected
    assert mo.group(1) == expected

关于javascript - 提取两个标记之间的子字符串。第二个 token 可能丢失,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/1083896/

10-11 21:56