问题描述
如何使用 Python 获取重叠正则表达式匹配的数量?
我已经阅读并尝试了来自 这个,that 和其他一些问题,但没有发现适合我的场景.这是:
- 输入示例字符串:
akka
- 搜索模式:
a.*k
一个合适的函数应该产生 2 作为匹配的数量,因为有两个可能的结束位置(k
个字母).
模式也可能更复杂,例如 a.*k.*a
也应该在 akka
中匹配两次(因为有两个 k
在中间).
是的,它丑陋且未优化,但似乎有效.这是对所有可能的但独特的变体
的简单尝试def myregex(pattern,text,dir=0):进口重新m = re.search(模式,文本)如果米:产量 m.group(0)如果 len(m.group('suffix')):for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[1:],m.group('end')),1):收益率如果目录"如果 j==i:res+=")(?P"res+=零件[j]如果 j==i+1:res+=")(?P"如果 j
测试:
>>>mycount('a*b*c','abc')设置(['abc'])>>>mycount('a*k','akka')设置(['akk','ak'])>>>mycount('b*o','bboo')set(['bbo', 'bboo', 'bo', 'boo'])>>>mycount('b*o','bb123oo')设置(['b123o','bb123oo','bb123o','b123oo'])>>>mycount('b*o','ffbfbffffoff')设置(['bfbfffofo','bfbfffo','bfffofo','bfffo'])How can I obtain the number of overlapping regex matches using Python?
I've read and tried the suggestions from this, that and a few other questions, but found none that would work for my scenario. Here it is:
- input example string:
akka
- search pattern:
a.*k
A proper function should yield 2 as the number of matches, since there are two possible end positions (k
letters).
The pattern might also be more complicated, for example a.*k.*a
should also be matched twice in akka
(since there are two k
's in the middle).
Yes, it is ugly and unoptimized but it seems to be working. This is a simple try of all possible but unique variants
def myregex(pattern,text,dir=0):
import re
m = re.search(pattern, text)
if m:
yield m.group(0)
if len(m.group('suffix')):
for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[1:],m.group('end')),1):
yield r
if dir<1 :
for r in myregex(pattern, "%s%s%s" % (m.group('prefix'),m.group('suffix')[:-1],m.group('end')),-1):
yield r
def myprocess(pattern, text):
parts = pattern.split("*")
for i in range(0, len(parts)-1 ):
res=""
for j in range(0, len(parts) ):
if j==0:
res+="(?P<prefix>"
if j==i:
res+=")(?P<suffix>"
res+=parts[j]
if j==i+1:
res+=")(?P<end>"
if j<len(parts)-1:
if j==i:
res+=".*"
else:
res+=".*?"
else:
res+=")"
for r in myregex(res,text):
yield r
def mycount(pattern, text):
return set(myprocess(pattern, text))
test:
>>> mycount('a*b*c','abc')
set(['abc'])
>>> mycount('a*k','akka')
set(['akk', 'ak'])
>>> mycount('b*o','bboo')
set(['bbo', 'bboo', 'bo', 'boo'])
>>> mycount('b*o','bb123oo')
set(['b123o', 'bb123oo', 'bb123o', 'b123oo'])
>>> mycount('b*o','ffbfbfffofoff')
set(['bfbfffofo', 'bfbfffo', 'bfffofo', 'bfffo'])
这篇关于再次计算重叠的正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!