问题描述
我有以下REGEX表达式(有效)以允许字母数字(以及'
和-
)并且没有双倍空格:
I have the following REGEX expression (that works) to allow Alpha-Numeric (as well as '
and -
) and no double spacing:
^([a-zA-Z0-9'-]+\s?)*$
由于嵌套分组,因此可以进行灾难性的回溯-太糟糕了!
Due to the nested grouping, this allows Catastrophic Backtracking to happen - which is bad!
如何简化该表达式以避免灾难性的回溯?(理想情况下,第一个和最后一个字符都不允许有空格)
How can I simplify this expression to avoid Catastrophic Backtracking??(Ideally this wouldn't allow white-space in first and last characters either)
推荐答案
说明
嵌套组不会自动导致灾难性的回溯.就您而言,这是因为您的正则表达式退化为灾难性回溯(a*)*
的经典示例.
由于^([a-zA-Z0-9'-]+\s?)*$
中可选的\s
,在输入时没有任何空格,但字符在允许的列表之外,因此正则表达式简单地退化为^([a-zA-Z0-9'-]+)*$
.
Since \s
in optional in ^([a-zA-Z0-9'-]+\s?)*$
, on input without any spaces but has characters outside the allowed list, the regex simply degenerates to ^([a-zA-Z0-9'-]+)*$
.
您还可以考虑对原始正则表达式进行扩展:
You can also think in term of expansion of the original regex:
[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?...
由于\s
是可选的,因此我们可以将其删除:
Since \s
is optional, we can remove it:
[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+...
我们得到了一系列连续的[a-zA-Z0-9'-]+
,它将尝试通过各种方式在它们之间分配字符并消除复杂性.
And we got a series of consecutive [a-zA-Z0-9'-]+
, which will try all ways to distribute the characters between themselves and blow up the complexity.
编写与token delimiter token ... delimiter token
匹配的正则表达式的标准方法是token (delimiter token)*
.虽然可以重写正则表达式而避免重复token
,但我建议不要这样做,因为很难正确处理它.为了避免重复,您可能希望通过字符串串联来构造正则表达式.
The standard way to write a regex to match token delimiter token ... delimiter token
is token (delimiter token)*
. While it is possible to rewrite the regex avoid repeating token
, I'd recommend against it, since it is harder to get it right. To avoid repetition , you might want to construct the regex by string concatenation instead.
遵循上面的食谱:
^[a-zA-Z0-9'-]+(\s[a-zA-Z0-9'-]+)*$
尽管您可以在此处看到重复中的重复,但由于正则表达式只能扩展为:
Although you can see repetition in repetition here, there is no catastrophic backtracking, since the regex can only expand to:
[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+...
\s
和[a-zA-Z0-9'-]
是互斥的-只有一种方法可以匹配任何字符串.
And \s
and [a-zA-Z0-9'-]
are mutual exclusive - there is only one way to match any string.
这篇关于没有灾难性回溯的简单字母数字正则表达式(单个间距)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!