问题描述
考虑此正则表达式.
a*b
如果aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
这需要调试器中的67
步骤失败.
This takes 67
steps in debugger to fail.
现在考虑使用此正则表达式.
Now consider this regex.
(?>a*)b
如果aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
这需要调试器中的133
步骤失败.
This takes 133
steps in debugger to fail.
最后是这个正则表达式:
And lastly this regex:
a*+b (a variant of atomic group)
如果aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
这需要调试器中的67
步骤失败.
This takes 67
steps in debugger to fail.
当我检查基准测试atomic group (?>a*)b
时,执行179%
的速度更快.
When I check the benchmark atomic group (?>a*)b
performs 179%
faster.
现在原子组禁用回溯.因此比赛表现不错.
Now atomic groups disable backtracking. So performance in match is good.
-
但是为什么步数更多?有人可以解释吗?
But why are the number of steps more? Can somebody explain on this?
为什么有差异.在两个原子组(?>a*)b
和a*+b
之间逐步进行.
Why is there a diff. in steps between two atomic groups (?>a*)b
and a*+b
.
它们的工作方式不同吗?
Do they work differently?
推荐答案
正则表达式中的每个组都需要执行步骤才能进入和退出该组.
是什么?
是的,我很认真,请继续阅读...
Every group in a regular expression takes a step to step into and out of the group.
WHAT?!
Yeah, I'm serious, read on...
首先,我想向您介绍量化的非捕获组,而没有该组:
Firstly, I would like to present you with quantified non-capturing groups, over without the group:
Pattern 1: (?:c)at
Pattern 2: cat
那么这里到底发生了什么?我们将在禁用优化的正则表达式引擎上将模式与测试字符串"concat"
匹配:
So what exactly happens here? We'll match the patterns with the test string "concat"
on a regex engine with optimizations disabled:
在讨论的同时,我还向您介绍一些小组:
While we're at it, I present you some more groups:
哦,不!我将避免使用群组!
Oh no! I'm going to avoid using groups!
但请耐心等待!!请注意,要进行匹配的步骤数与比赛的性能没有相关. pcre 引擎的问题可以优化大部分不必要的步骤" ;正如我所提到的. 尽管在禁用优化的引擎上采取了更多措施,但是原子组仍然是最高效的.
But wait! Please note that the number of steps taken to match has no correlation with the performance of the match. pcre engines optimizes away most of the "unnecessary steps" as I've mentioned. Atomic groups are still the most efficient, despite more steps taken on an engine with optimizations disabled.
可能相关:
这篇关于原子团的清晰度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!