本文介绍了没有灾难性回溯的简单字母数字正则表达式(单个间距)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下REGEX表达式(有效)以允许字母数字(以及'-)并且没有双倍空格:

I have the following REGEX expression (that works) to allow Alpha-Numeric (as well as ' and -) and no double spacing:

  ^([a-zA-Z0-9'-]+\s?)*$

由于嵌套分组,因此可以进行灾难性的回溯-太糟糕了!

Due to the nested grouping, this allows Catastrophic Backtracking to happen - which is bad!

如何简化该表达式以避免灾难性的回溯?(理想情况下,第一个和最后一个字符都不允许有空格)

How can I simplify this expression to avoid Catastrophic Backtracking??(Ideally this wouldn't allow white-space in first and last characters either)

推荐答案

说明

嵌套组不会自动导致灾难性的回溯.就您而言,这是因为您的正则表达式退化为灾难性回溯(a*)*的经典示例.

由于^([a-zA-Z0-9'-]+\s?)*$中可选的\s,在输入时没有任何空格,但字符在允许的列表之外,因此正则表达式简单地退化为^([a-zA-Z0-9'-]+)*$.

Since \s in optional in ^([a-zA-Z0-9'-]+\s?)*$, on input without any spaces but has characters outside the allowed list, the regex simply degenerates to ^([a-zA-Z0-9'-]+)*$.

您还可以考虑对原始正则表达式进行扩展:

You can also think in term of expansion of the original regex:

[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?...

由于\s是可选的,因此我们可以将其删除:

Since \s is optional, we can remove it:

[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+...

我们得到了一系列连续的[a-zA-Z0-9'-]+,它将尝试通过各种方式在它们之间分配字符并消除复杂性.

And we got a series of consecutive [a-zA-Z0-9'-]+, which will try all ways to distribute the characters between themselves and blow up the complexity.

编写与token delimiter token ... delimiter token匹配的正则表达式的标准方法是token (delimiter token)*.虽然可以重写正则表达式而避免重复token,但我建议不要这样做,因为很难正确处理它.为了避免重复,您可能希望通过字符串串联来构造正则表达式.

The standard way to write a regex to match token delimiter token ... delimiter token is token (delimiter token)*. While it is possible to rewrite the regex avoid repeating token, I'd recommend against it, since it is harder to get it right. To avoid repetition , you might want to construct the regex by string concatenation instead.

遵循上面的食谱:

^[a-zA-Z0-9'-]+(\s[a-zA-Z0-9'-]+)*$

尽管您可以在此处看到重复中​​的重复,但由于正则表达式只能扩展为:

Although you can see repetition in repetition here, there is no catastrophic backtracking, since the regex can only expand to:

[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+...

\s[a-zA-Z0-9'-]是互斥的-只有一种方法可以匹配任何字符串.

And \s and [a-zA-Z0-9'-] are mutual exclusive - there is only one way to match any string.

这篇关于没有灾难性回溯的简单字母数字正则表达式(单个间距)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 05:55