美好的一天,
我需要提取一部分字符串,看起来像这样:
"some_text MarkerA some_text_to_extract MarkerB some_text"
"some_text MarkerA some_text_to_extract"
在这两种情况下,我都需要提取
some_text_to_extract
。MarkerA
,MarkerB
-预定义的文本字符串。我尝试过此正则表达式,但没有运气:
".*\sMarkerA(.*)MarkerB.*" - does not work in case 2
".*\sMarkerA(.*)(?=MarkerB)?.*" - wrong result "some_text_to_extract MarkerB some_text"
".*\sMarkerA(.*)(?:MarkerB)?.*" - does not work at all
您能帮我解决这个问题吗?
最佳答案
尝试:
".*\sMarkerA(.*?)(?=$|MarkerB)"
测试代码:
#!/usr/bin/env python
tests = [
("some_text MarkerA some_text_to_extract MarkerB some_text"," some_text_to_extract "),
("some_text MarkerA some_text_to_extract"," some_text_to_extract")
]
import re
reg = re.compile(r".*\sMarkerA(.*?)(?=$|MarkerB)")
for (input,expected) in tests:
mo = reg.match(input)
assert mo is not None
print mo.group(1),expected
assert mo.group(1) == expected
关于javascript - 提取两个标记之间的子字符串。第二个 token 可能丢失,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/1083896/