问题描述
在匹配电子邮件地址时,在我匹配诸如 yasar@webmail
之类的内容后,我想捕获一个或多个 (\.\w+)
(我是什么这样做有点复杂,这只是一个例子),我尝试添加 (.\w+)+ ,但它只捕获最后一场比赛.例如,[email protected]
匹配但只包含 .tr
之后 yasar@webmail
部分,所以我丢失了 .something
和 .edu
组.我可以在 Python 正则表达式中执行此操作,还是建议先匹配所有内容,然后再拆分子模式?
re
模块不支持重复捕获 (regex
支持):
在您的情况下,我稍后会拆分重复的子模式.它会生成一个简单易读的代码,例如,请参阅 @Li-aung Yip 的回答中的代码.
While matching an email address, after I match something like yasar@webmail
, I want to capture one or more of (\.\w+)
(what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+ , but it only captures last match. For example, [email protected]
matches but only include .tr
after yasar@webmail
part, so I lost .something
and .edu
groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?
re
module doesn't support repeated captures (regex
supports it):
>>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', '[email protected]')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']
In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.
这篇关于捕获 Python 正则表达式中的重复子模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!