问题描述
我试图回答这个问题,其中OP具有以下字符串:
I was trying to anser this question where the OP has the following string:
"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
,并希望将其拆分以获取以下列表:
and wants to split it to obtain the following list:
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
我试图通过在正则表达式(?=path:)
中使用简单的先行断言来解决此问题.好吧,它不起作用:
I tried to solve it by using a simple lookahead assertion in a regex, (?=path:)
. Well, it did not work:
>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']
但是,在此答案中,回答者得到了它通过在先行断言之前加一个空格来工作:
However, in this answer, the answerer got it working by preceding the lookahead assertion with a whitespace:
>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
为什么正则表达式与空白一起使用?为什么没有空格怎么办?
Why did the regex work with the whitespace? Why did it not work without the whitespace?
推荐答案
Python的re.split()
具有已记录的限制:在长度为零的匹配项中无法拆分.因此,拆分仅在增加空间的情况下起作用.
Python's re.split()
has a documented limitation: It can't split on zero-length matches. Therefore the split only worked with the added space.
这篇关于在Python中具有前瞻性断言的两个非常接近的正则表达式-为什么re.split()的行为不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!