捕获 Python 正则表达式中的重复子模式

本文介绍了捕获 Python 正则表达式中的重复子模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在匹配电子邮件地址时，在我匹配诸如 yasar@webmail 之类的内容后，我想捕获一个或多个 (\.\w+)(我是什么这样做有点复杂，这只是一个例子)，我尝试添加 (.\w+)+ ，但它只捕获最后一场比赛.例如，yasar@webmail.something.edu.tr 匹配但只包含 .tr 之后 yasar@webmail 部分，所以我丢失了 .something 和 .edu 组.我可以在 Python 正则表达式中执行此操作，还是建议先匹配所有内容，然后再拆分子模式?

解决方案

re 模块不支持重复捕获 (regex 支持):

>>>m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')>>>m.groups()('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')>>>m.captures(4)['.something', '.edu', '.tr']

在您的情况下，我稍后会拆分重复的子模式.它会生成一个简单易读的代码，例如，请参阅 @Li-aung Yip 的回答中的代码.

While matching an email address, after I match something like yasar@webmail, I want to capture one or more of (\.\w+)(what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+ , but it only captures last match. For example, yasar@webmail.something.edu.tr matches but only include .tr after yasar@webmail part, so I lost .something and .edu groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?

解决方案

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

这篇关于捕获 Python 正则表达式中的重复子模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！