问题描述
当我在 python 2.7 中运行以下语句时,
re.search('eagle|cat', '猫是动物.鹰是鸟').group()
我期待看到 'eagle'
作为正则表达式文档的结果但是我得到了 'cat'
.我在这里遗漏了什么吗?
具有替代模式(由 |
分隔)的正则表达式不会扫描整个字符串以查找第一个替代方案,然后是第二个.
相反,在输入字符串的每个位置考虑每个选项.所以在位置 0,eagle
和 cat
都不匹配,但在位置 4,cat
匹配,即使 eagle
第一次尝试.
因此,cat
作为匹配项返回;字符串的其余部分不再需要考虑.
当两种模式在同一位置匹配时,替代顺序很重要.所以 cat|cats
总是会返回 cat
,即使在输入字符串中的那个词之后有一个 s
:
When I run the below statement in python 2.7,
re.search('eagle|cat', 'The cat is an animal. The eagle is bird').group()
I'm expecting to see 'eagle'
as result as per regular expression docBut I'm getting 'cat'
. Am I missing something here?
A regular expression with alternative patterns (separated by |
) does not scan the whole string for the first alternative, then the second.
Instead, each alternative is considered at each position in the input string. So at position 0, neither eagle
nor cat
match, but at position 4, cat
matches, even though eagle
was tried first.
Thus, cat
is returned as the match; the rest of the string no longer needs to be considered.
The alternative ordering matters when both patterns would match at the same location. So cat|cats
would return cat
, always, even if there is an s
after that word in the input string:
>>> import re
>>> re.search('cat|cats', 'Like herding cats.').group()
'cat'
>>> re.search('cats|cat', 'Like herding cats.').group()
'cats'
这篇关于为什么正则表达式交替 (A|B) 与文档不匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!